mirror of
https://github.com/fdiskyou/Zines.git
synced 2025-03-09 00:00:00 +01:00
1828 lines
84 KiB
Text
1828 lines
84 KiB
Text
![]() |
==Phrack Inc.==
|
||
|
|
||
|
Volume 0x0c, Issue 0x41, Phile #0x04 of 0x0f
|
||
|
|
||
|
|=-----------------------------------------------------------------------=|
|
||
|
|=---=[ Stealth hooking : Another way to subvert the Windows kernel ]=---=|
|
||
|
|=-----------------------------------------------------------------------=|
|
||
|
|=--------------------=[ by mxatone and ivanlef0u ]=---------------------=|
|
||
|
|=-----------------------------------------------------------------------=|
|
||
|
|
||
|
|
||
|
1 - Introduction on anti-rookits technologies and bypass
|
||
|
1.1 - Rookits and anti-rootkits techniques
|
||
|
1.2 - About kernel level protections
|
||
|
1.3 - Concept key: use kernel code against itself
|
||
|
|
||
|
2 - Introducing stealth hooking on IDT.
|
||
|
2.1 - How Windows manage hardware interrupts
|
||
|
2.1.1 - Hardware interrupts dispatching on Windows
|
||
|
2.1.2 - Hooking hardware IT like a ninja
|
||
|
2.1.3 - Application 1 : Kernel keylogger
|
||
|
2.1.4 - Application 2 : NDIS incoming packets sniffer
|
||
|
2.2 - Conclusion about stealth hooking on IDT
|
||
|
|
||
|
3 - Owning NonPaged pool using stealth hooking
|
||
|
3.1 - Kernel allocation layout review
|
||
|
3.1.1 - Difference between Paged and NonPaged pool
|
||
|
3.1.2 - NonPaged pool tables
|
||
|
3.1.3 - Allocation and free algorithms
|
||
|
3.2 - Getting code execution abusing allocation code
|
||
|
3.2.1 - Data corruption of MmNonPagedPoolFreeListHead
|
||
|
3.2.2 - Expend it for every size
|
||
|
3.3 - Exploit our position
|
||
|
3.3.1 - Generic stack redirection
|
||
|
3.3.2 - Userland process code injection
|
||
|
|
||
|
4 - Detection
|
||
|
5 - Conclusion
|
||
|
6 - References
|
||
|
|
||
|
|
||
|
---[ 1 - Introduction on anti-rookits technologies and bypass
|
||
|
|
||
|
Nowadays rootkits and anti-rootkits are becoming more and more important
|
||
|
into the IT security landscape. Loved by some, hated by others, rootkits
|
||
|
can be considered as the holy grail of backdoors : stealthy, little,
|
||
|
close to hardware, ingenious, vicious... Their control over a computer
|
||
|
locally or remotely make them the best choice for an attacker.
|
||
|
Anti-rootkits try to detect and eradicate those malicious programs.
|
||
|
Rk techniques and complexity are evolving fast and today developing a rk or
|
||
|
anti-rk is a very hard mission.
|
||
|
|
||
|
This paper deals about rootkits on Windows platform. More precisely about
|
||
|
new kind of hijacking techniques that can be applied to the Windows kernel.
|
||
|
Readers are assumed to be aware about rootkits techniques on Windows.
|
||
|
|
||
|
----[ 1.1 - Rootkits and anti-rootkits technics
|
||
|
|
||
|
A rootkit hijacks an operating system's behavior. In order to achieve this
|
||
|
task, it can simply modify the operating system's binaries but that's not
|
||
|
very stealthy. Most rk's use hooks on important functions and change theirs
|
||
|
results. A basic hook redirects execution flow by changing function start
|
||
|
or a function pointer but there is no single way to hook a routine. The
|
||
|
most common example is the SSDT (System Service Descriptor Table), this
|
||
|
table contains the syscall list which is a set of functions pointers. If
|
||
|
you can modify a pointer in this table, you are able to control the
|
||
|
behavior of one function. That's an example of how rootkits proceed,
|
||
|
obviously there is a lot of critical areas that can be controlled by an
|
||
|
attacker.
|
||
|
|
||
|
Anti-rootkits try to check those areas, but the task is very hard. Most of
|
||
|
the time, anti-rk software makes a comparison between the memory image of
|
||
|
the program and its binary on the disk or verify some function pointer
|
||
|
tables to see if something has changed.
|
||
|
|
||
|
That's how the war between rk-makers and anti-rk-junkies began, trying
|
||
|
to find the best way, the best area, for hooking critical operating
|
||
|
system features. On Windows those following areas are often used by
|
||
|
rootkits :
|
||
|
|
||
|
- SSDT (kernel syscalls table) and shadow SSDT (win32k syscall table) are
|
||
|
the simplest solution.
|
||
|
|
||
|
- MSR (Model Specific Registers) can be modified by a rootkit. On Windows
|
||
|
the MSR_SYSENTER_EIP is used by the assembly instructions 'sysenter' to
|
||
|
enter into ring0 mode. Hijacking this MSR allow an attacker to control
|
||
|
the system.
|
||
|
|
||
|
- MajorFunctions are functions used by drivers for I/O processing with
|
||
|
others devices, hooking those functions can be useful for a rootkit.
|
||
|
|
||
|
- IDT (Interrupt Descriptor Table) is table used by the system for
|
||
|
handling exceptions and interruptions.
|
||
|
|
||
|
Another kind of techniques has appeared. By accessing to the kernel
|
||
|
objects a rootkit can easily change information about processes, threads,
|
||
|
loaded modules and other stuff. Those techniques are called DKOM (Direct
|
||
|
Kernel Object Manipulation). For example, the Windows kernel maintains a
|
||
|
double linked list called PsActiveProcessList (EPROCESS structures)
|
||
|
containing information about running processes. Unlink one of them and
|
||
|
your process will disappear from process lists like task manager, whereas
|
||
|
the process is still running.
|
||
|
|
||
|
To block those kernel objects modifications, anti-rk checks other
|
||
|
sections. For processes, they used to read the PspCidTable which
|
||
|
has a table of PID (Process IDentifier) and TID (Thread IDentifier).
|
||
|
A comparison between this table and PsActiveProcessList shows hidden
|
||
|
processes. Against those attacks anti-rk tools have to find others
|
||
|
sections and tricks to detect altered objects.
|
||
|
|
||
|
One of the first paper about Windows stealth was written by Holy Father,
|
||
|
"Invisibility on NT boxes" [1]. With this paper came one of the first
|
||
|
public implementations of a rootkit with a ring0 driver, Hacker
|
||
|
Defender [2], coded by Holy Father and Ratter of the famous VXing mag 29A
|
||
|
[3]. This driver was able to elevate process rights using token
|
||
|
manipulation. The rest of the rootkit uses user-land hooks to perform files
|
||
|
and registry hiding, process infection with dll injection. A good example
|
||
|
of a full ring0 rootkit is NT Rootkit of Greg Hoglund [4], this driver uses
|
||
|
SSDT hooks to perform stealth operation. It registers a Filter Device
|
||
|
Object above the NTFS file system and above the keyboard device for
|
||
|
filtering IRP (I/O Request Packets). It also provides a NDIS protocol
|
||
|
driver to hide communication on the network. Even if this rk was written
|
||
|
for NT 4.0 and Win2K it's a perfect example for beginners.
|
||
|
|
||
|
After came more advanced ring0 rk like FU [5], written by Fuzen_op and its
|
||
|
improvement FUto published in the famous technical journal Uninformed [6].
|
||
|
Vista improvement on driver verification introduces new rootkits mostly
|
||
|
based on hardware features. Like BootRoot [7] and Pixie [8] by Eeye
|
||
|
loaded before any protection. Finally Joanna Rutkowska with her Blue
|
||
|
Pill [9] used virtualization technology to create layer between the
|
||
|
operating system and the hardware.
|
||
|
|
||
|
In the wild the rk are used most of the time for lame mail spamming or
|
||
|
botnets. They often use old techniques but some of them are interesting
|
||
|
like Rustock [10] series or StormWorm [11] and the MBR rootkit [12]. They
|
||
|
implement a lot of tricks as ADS (Alternate Data Stream), code obfuscation,
|
||
|
anti-debug, anti-VM or polymorphic code. The goal is not only subverting
|
||
|
the kernel but also slow down their analysis and make them harder to
|
||
|
defeat.
|
||
|
|
||
|
Even if the technology used by rootkits are more and more sophisticated,
|
||
|
the underground community is still developing POCs to improve current
|
||
|
techniques. Unreal [13] and AK992 [14] are both great examples. The first
|
||
|
uses an ADS and a NTFS MajorFunctions hooking to hide itself, the second
|
||
|
checks IRP completion when sended to disk's devices. You can find plenty
|
||
|
examples of rootkit techniques on rootkit.com.
|
||
|
|
||
|
Finally, this part would not be complete if we don't speak about anti-rk.
|
||
|
The most famous is Rk Unhooker by MP_ART & EP_X0FF and their team UG North.
|
||
|
Others anti-rk are DarkSpy [15] by CardMagic, IceSword [16] by pjf and
|
||
|
Gmer [17].
|
||
|
|
||
|
----[ 1.2 - About kernel level protections
|
||
|
|
||
|
When we talk about protection, we must notice where the protection takes
|
||
|
place into the system. A protection has an advantage on an attack only
|
||
|
if it operates from a higher level. Protections like PaX or Exec Shield
|
||
|
are efficient because they protecting userland from kernel.
|
||
|
|
||
|
Protections like PatchGuard and other HIPS also protect the system
|
||
|
integrity but as far as an attacker can find a way to attack those
|
||
|
protections at their own level they will be useless. A protection is
|
||
|
reliable only if it can't be corrupted by an attacker. Assuming an
|
||
|
attacker find a way to inject code into the protection and you can
|
||
|
consider that your b0x is dead.
|
||
|
|
||
|
That's why PatchGuard isn't so efficient [18]. But we know that disabling
|
||
|
or destroying a protection is very noisy. No, the best way is to fly under
|
||
|
the radar by working with special objects and events that cannot be
|
||
|
checked because of their volatility.
|
||
|
|
||
|
In June 2006, Greg Hoglund presented the concept of KOH (Kernel Object
|
||
|
Hooking) [19]. A new way of detouring code execution, you don't have to
|
||
|
modify static code section but rather you work on dynamic allocated
|
||
|
structures/codes like DPC (Deferred Procedure Calls). For protections,
|
||
|
it's hard to find and verify those areas due to their instabilities.
|
||
|
|
||
|
Others cool objects are IRP. They are the object used by the Windows
|
||
|
kernel I/O manager to communicate with devices. Each I/O operation on
|
||
|
hardware generates an IRP, sycalls send IRP to a driver through his
|
||
|
device. In general a driver owns several devices; one of them is used to
|
||
|
communique with the userland by using IOCTL and others devices are
|
||
|
managing IRP by filtering them or performing a requested task.
|
||
|
IRP are sent to a driver using its MajorFunctions table. This table
|
||
|
includes the different functionalities provided by the driver. You can
|
||
|
check the result returned by a MajorFunction by installing a completion
|
||
|
routine on an IRP. They are very volatile objects; controlling and
|
||
|
checking them is very hard.
|
||
|
|
||
|
In fact, if you want to check everything you would need to completely
|
||
|
redesign operating system architecture. So keep in mind that protection
|
||
|
cannot be everywhere at every time and we will demonstrate it in the
|
||
|
following parts.
|
||
|
|
||
|
----[ 1.3 - Concept key: use kernel code against itself
|
||
|
|
||
|
The idea behind this paper is exploiting kernel code. Exploitation is
|
||
|
possible because input defines code behavior. Submitting a crafted input
|
||
|
to a vulnerable software can leads into code execution. Dangerous input is
|
||
|
of course defined by your target. Kernel space contains more exploitation
|
||
|
scenarios because you can change its environment. A rootkit can not
|
||
|
change basic inputs as arguments. But it can change the environment around
|
||
|
a code. Heap exploitation techniques such as unlinking is a perfect
|
||
|
example. By changing a memory block structure, you are able to overwrite
|
||
|
4 bytes. Some techniques can even change next allocated block address [20].
|
||
|
It does work because a program trusts those information. In kernel, you
|
||
|
have a total control on the environment. Also completely checking the
|
||
|
kernel is bad for performance and totally impossible.
|
||
|
|
||
|
Changing code environment has been used successfully for the phide2
|
||
|
rootkit [21] technique. This rootkit can hide threads without hooking
|
||
|
Windows scheduler which is impressive. As it relies on code behavior, it
|
||
|
needs strong reverse knowledge. It extends this concept into unknown
|
||
|
operating system behaviors. Generic protections are based on generic
|
||
|
assumptions. Such as checking only driver images for code hooks. These
|
||
|
days operating systems design is against those protections and requires
|
||
|
advanced software rootkit techniques.
|
||
|
|
||
|
---[ 2 - Introducing stealth hooking on IDT
|
||
|
|
||
|
Let's introduce our concept about stealth hooking with an example based on
|
||
|
IDT. First we will see what is the IDT and its purpose. Then we will
|
||
|
discuss about hardware interrupts and how Windows deals with them.
|
||
|
|
||
|
IDT (Interrupt Descriptor Table) is a CPU specific linear table localized
|
||
|
in kernel-land. IDT can be read with ring3 privilege level but you must
|
||
|
have ring0 privilege if you want to write into it. IDT is composed of 256
|
||
|
entries of KIDTENTRY structures and you can use the Kernel Debugger (KD)
|
||
|
included into the Debugging Tools for Windows [22] to see the definition
|
||
|
of an IDT entry.
|
||
|
|
||
|
kd> dt nt!_KIDTENTRY
|
||
|
+0x000 Offset : Uint2B
|
||
|
+0x002 Selector : Uint2B
|
||
|
+0x004 Access : Uint2B
|
||
|
+0x006 ExtendedOffset : Uint2B
|
||
|
|
||
|
Here we don't want to (re)explain the architecture of the IDT so we advise
|
||
|
you to read Kad's paper published in Phrack 59 about IDT and about how it
|
||
|
works [23].
|
||
|
|
||
|
The first 32 entries of IDT are reserved by the CPU for exceptions. Others
|
||
|
are use to handle hardware interrupts and special system events.
|
||
|
|
||
|
Here is a dump of the first 64 entries of the Windows' IDT.
|
||
|
|
||
|
kd> !idt -a
|
||
|
Dumping IDT:
|
||
|
|
||
|
00: 804df350 nt!KiTrap00
|
||
|
01: 804df4cb nt!KiTrap01
|
||
|
02: Task Selector = 0x0058
|
||
|
03: 804df89d nt!KiTrap03
|
||
|
04: 804dfa20 nt!KiTrap04
|
||
|
05: 804dfb81 nt!KiTrap05
|
||
|
06: 804dfd02 nt!KiTrap06
|
||
|
07: 804e036a nt!KiTrap07
|
||
|
08: Task Selector = 0x0050
|
||
|
09: 804e078f nt!KiTrap09
|
||
|
0a: 804e08ac nt!KiTrap0A
|
||
|
0b: 804e09e9 nt!KiTrap0B
|
||
|
0c: 804e0c42 nt!KiTrap0C
|
||
|
0d: 804e0f38 nt!KiTrap0D
|
||
|
0e: 804e164f nt!KiTrap0E
|
||
|
0f: 804e197c nt!KiTrap0F
|
||
|
10: 804e1a99 nt!KiTrap10
|
||
|
11: 804e1bce nt!KiTrap11
|
||
|
12: 804e197c nt!KiTrap0F
|
||
|
13: 804e1d34 nt!KiTrap13
|
||
|
14: 804e197c nt!KiTrap0F
|
||
|
15: 804e197c nt!KiTrap0F
|
||
|
16: 804e197c nt!KiTrap0F
|
||
|
17: 804e197c nt!KiTrap0F
|
||
|
18: 804e197c nt!KiTrap0F
|
||
|
19: 804e197c nt!KiTrap0F
|
||
|
1a: 804e197c nt!KiTrap0F
|
||
|
1b: 804e197c nt!KiTrap0F
|
||
|
1c: 804e197c nt!KiTrap0F
|
||
|
1d: 804e197c nt!KiTrap0F
|
||
|
1e: 804e197c nt!KiTrap0F
|
||
|
1f: 804e197c nt!KiTrap0F
|
||
|
|
||
|
20: 00000000
|
||
|
21: 00000000
|
||
|
22: 00000000
|
||
|
23: 00000000
|
||
|
24: 00000000
|
||
|
25: 00000000
|
||
|
26: 00000000
|
||
|
27: 00000000
|
||
|
28: 00000000
|
||
|
29: 00000000
|
||
|
2a: 804deb92 nt!KiGetTickCount
|
||
|
2b: 804dec95 nt!KiCallbackReturn
|
||
|
2c: 804dee34 nt!KiSetLowWaitHighThread
|
||
|
2d: 804df77c nt!KiDebugService
|
||
|
2e: 804de631 nt!KiSystemService
|
||
|
2f: 804e197c nt!KiTrap0F
|
||
|
30: 806f3d48 hal!HalpClockInterrupt
|
||
|
31: 80dd816c i8042prt!I8042KeyboardInterruptService (KINTERRUPT 80dd8130)
|
||
|
32: 804ddd04 nt!KiUnexpectedInterrupt2
|
||
|
33: 80dd3224 serial!SerialCIsrSw (KINTERRUPT 80dd31e8)
|
||
|
34: 804ddd18 nt!KiUnexpectedInterrupt4
|
||
|
35: 804ddd22 nt!KiUnexpectedInterrupt5
|
||
|
36: 804ddd2c nt!KiUnexpectedInterrupt6
|
||
|
37: 804ddd36 nt!KiUnexpectedInterrupt7
|
||
|
38: 806edef0 hal!HalpProfileInterrupt
|
||
|
39: 80f0827c ACPI!ACPIInterruptServiceRoutine (KINTERRUPT 80f08240)
|
||
|
3a: 80dc67cc vmsrvc+0x1C16 (KINTERRUPT 80dc6790)
|
||
|
3b: 80df6414 NDIS!ndisMIsr (KINTERRUPT 80df63d8)
|
||
|
3c: 80de040c i8042prt!I8042MouseInterruptService (KINTERRUPT 80de03d0)
|
||
|
3d: 804ddd72 nt!KiUnexpectedInterrupt13
|
||
|
3e: 80ed78a4 atapi!IdePortInterrupt (KINTERRUPT 80ed7868)
|
||
|
3f: 80f01dd4 atapi!IdePortInterrupt (KINTERRUPT 80f01d98)
|
||
|
40: 804ddd90 nt!KiUnexpectedInterrupt16
|
||
|
[...]
|
||
|
|
||
|
This dump represents a typical Windows IDT, you can see the IDT entries
|
||
|
index followed by the address of the handler and this name. The first 32
|
||
|
entries are filled by KiTrap* functions that manage exceptions. The rest
|
||
|
of the table is left to the system, you can see specials system interrupts
|
||
|
like KiSystemService and KiCallbackReturn and handlers used by drivers
|
||
|
like I8042KeyboardInterruptService or I8042MouseInterruptService.
|
||
|
|
||
|
----[ 2.1 - How Windows manage hardware interrupts
|
||
|
|
||
|
When we talk about interrupts we must introduce the concept of IRQL
|
||
|
(Interrupt ReQuest Level). The kernel represents IRQLs internally as a
|
||
|
number from 0 through 31 on x86 with higher numbers representing higher
|
||
|
priority interrupts. Although the kernel defines the standard set of IRQLs
|
||
|
for software interrupts, the HAL (Hardware Abstraction Layer) maps
|
||
|
hardware interrupt numbers to the IRQLs.
|
||
|
|
||
|
+----------------+
|
||
|
31 | Highests | \
|
||
|
to | IRQLs | | Clock, system failure.
|
||
|
27 | | /
|
||
|
+----------------+
|
||
|
26 | | \
|
||
|
to | DEVICE_IRQL | | Hardware interrupts.
|
||
|
3 | | /
|
||
|
+----------------+
|
||
|
2 | DISPATCH_LEVEL | Scheduler, DPC.
|
||
|
+----------------+
|
||
|
1 | APC_LEVEL | Used when dispatching APC.
|
||
|
+----------------+
|
||
|
0 | PASSIVE_LEVEL | Threads run at this IRQL.
|
||
|
+----------------+
|
||
|
|
||
|
Each processor has its own IRQL. You can have a core running at an IRQL=
|
||
|
DISPATCH_LEVEL whereas another is running at PASSIVE_LEVEL. In fact IRQL
|
||
|
represents the "mask ability" of the current running code. Interrupts from
|
||
|
a source with an IRQL above the current level interrupt the processor,
|
||
|
whereas interrupts from sources with IRQLs equal to or below the current
|
||
|
level are masked until an executing thread decrease the IRQL.
|
||
|
|
||
|
Some system components are not accessible when code is running at
|
||
|
IRQL>=DISPATH_LEVEL. Accessing to paged memory (memory which can be
|
||
|
swapped on disk) is impossible and lots of kernel functions cannot be used.
|
||
|
|
||
|
Hardware interrupts are asynchronous and reached by external peripherals.
|
||
|
For example when you hit a key, your keyboard device sends an IRQ
|
||
|
(Interrupt ReQuest) routed by the Southbridge [24] on your interrupt
|
||
|
controller through the Northbridge [25]. The Southbridge is a chip that can
|
||
|
be described like a I/O controller hub. This chip receives all the I/O
|
||
|
externals interrupt and send them to the Northbridge. The Northbridge is
|
||
|
directly connected to your memory and high speed graphic bus also to your
|
||
|
CPU. This chip is also known as the memory controller hub.
|
||
|
|
||
|
On most x86 systems we find a chipset called i82489, Advanced Programmable
|
||
|
Interrupt Controller (APIC). The APIC is composed by 2 main components, a
|
||
|
I/O APIC, one per CPU, and a LAPIC (Local APIC) on each core. I/O APIC
|
||
|
uses a routing algorithm to dispatch an interrupt on the best adapted core.
|
||
|
According to the principle of locality, I/O APIC will deliver the device
|
||
|
interrupt on the core which handled it the previous time [26].
|
||
|
|
||
|
After this LAPIC translates the IRQ to an 8-bits value, an interrupt
|
||
|
vector. This interrupt vector represents IDT's entry index associated with
|
||
|
the handler. When the core is ready to handle the interrupt, its
|
||
|
instruction flow is redirected on the IDT entry.
|
||
|
|
||
|
IDT IDT IDT IDT
|
||
|
1 2 3 4
|
||
|
+---+ +---+ +---+ +---+
|
||
|
| | | | | | | |
|
||
|
|---| |---| |---| |---|
|
||
|
| | | | | | | |
|
||
|
|---| |---| |---| |---|
|
||
|
| | | | | | | |
|
||
|
+---+ +---+ +---+ +---+
|
||
|
| | | |
|
||
|
+--------+ +--------+ +--------+ +--------+
|
||
|
| | | | | | | |
|
||
|
| core 1 | | core 2 | | core 3 | | core 4 |
|
||
|
| | | | | | | |
|
||
|
+--------+ +--------+ +--------+ +--------+
|
||
|
| LAPIC | | LAPIC | | LAPIC | | LAPIC |
|
||
|
+---+----+ +---+----+ +---+----+ +---+----+
|
||
|
| | | |
|
||
|
| | | |
|
||
|
<---+--------------+------+-------+-------------+----->
|
||
|
Interrupt | Processor system bus
|
||
|
Messages |
|
||
|
|
|
||
|
|
|
||
|
External +------+------+
|
||
|
Interrupts | |
|
||
|
---------------> I/O APIC |
|
||
|
| |
|
||
|
+-------------+
|
||
|
|
||
|
-----[ 2.3.1 Hardware interrupts dispatching on Windows
|
||
|
|
||
|
On Windows, the interrupt handler isn't executed immediately, there is a
|
||
|
code template first. This template is implemented in the function
|
||
|
KiInterruptTemplate and does two things. First, it saves the current
|
||
|
core state in the stack and dispatches code flow to the right "interrupt
|
||
|
dispatcher".
|
||
|
|
||
|
When a interrupt is raised, after the core status core is saved, code flow
|
||
|
is transferred to the interrupt handler as defined in the IDT. In fact
|
||
|
each interrupt handler in the IDT points to a KiInterruptTemplate
|
||
|
routine [27]. KiInterruptTemplate will call KiInterruptDispatch which
|
||
|
performs the following operations :
|
||
|
|
||
|
- Acquire the service routine spinlock.
|
||
|
|
||
|
- Raise IRQL to DEVICE_IRQL, the IRQL of a given interrupt vector is
|
||
|
calculated by subtracting the interrupt vector from 27d.
|
||
|
|
||
|
- Call the interrupt handler, an ISR (Interrupt Service Routine).
|
||
|
|
||
|
- Lower IRQL.
|
||
|
|
||
|
- Release the service routine spinlock.
|
||
|
|
||
|
For example, the keyboard device ISR is I8042KeyboardInterruptService.
|
||
|
ISR are routines for handling interrupts like top-halves in the linux
|
||
|
kernel. According to the WDK (Windows Driver Kit), the ISR must do
|
||
|
whatever is appropriate to the device to dismiss the interrupt. Then, it
|
||
|
should do only what is necessary to save stage and queue a DPC. It means it
|
||
|
interruption management will take place on a lower IRQL than during ISR
|
||
|
execution. The I/O processing is done into the DPC.
|
||
|
|
||
|
DPC (Deferred Procedure Call) are equivalent of bottom-halves in linux.
|
||
|
DPC works at IRQL DISPATCH_LEVEL, lower than the ISR's IRQL. In fact the
|
||
|
ISR will queue a DPC to process the entire interrupt at a lower IRQL in
|
||
|
order to avoid the core preemption taking too much time. For the keyboard
|
||
|
the DPC is I8042KeyboardIsrDpc. Here a figure to sum up the interrupt
|
||
|
processing :
|
||
|
|
||
|
+-------------------------+
|
||
|
Hardware Interrupt /----> Here we are at |
|
||
|
| | | IRQL=DEVICE_LEVEL |
|
||
|
| | | The KiInterruptDispatch |
|
||
|
/---> IDT ---\ | | routine calls the ISR. |
|
||
|
| | | |
|
||
|
| | | ISR handles interrupt |
|
||
|
+-----------------------+ | | and queue a DPC for |
|
||
|
| KiInterruptTemplate ------/ | later processing |
|
||
|
+-----------------------+ | |
|
||
|
+-------------------------+
|
||
|
|
||
|
KiInterruptDispatch receives one main argument from KiInterruptTemplate,
|
||
|
a pointer to an interrupt object stored in the EDI register. Interrupt
|
||
|
objects are defined by a KINTERRUPT structure :
|
||
|
|
||
|
kd> dt nt!_KINTERRUPT
|
||
|
+0x000 Type : Int2B
|
||
|
+0x002 Size : Int2B
|
||
|
+0x004 InterruptListEntry : _LIST_ENTRY
|
||
|
+0x00c ServiceRoutine : Ptr32 unsigned char
|
||
|
+0x010 ServiceContext : Ptr32 Void
|
||
|
+0x014 SpinLock : Uint4B
|
||
|
+0x018 TickCount : Uint4B
|
||
|
+0x01c ActualLock : Ptr32 Uint4B
|
||
|
+0x020 DispatchAddress : Ptr32 void
|
||
|
+0x024 Vector : Uint4B
|
||
|
+0x028 Irql : UChar
|
||
|
+0x029 SynchronizeIrql : UChar
|
||
|
+0x02a FloatingSave : UChar
|
||
|
+0x02b Connected : UChar
|
||
|
+0x02c Number : Char
|
||
|
+0x02d ShareVector : UChar
|
||
|
+0x030 Mode : _KINTERRUPT_MODE
|
||
|
+0x034 ServiceCount : Uint4B
|
||
|
+0x038 DispatchCount : Uint4B
|
||
|
+0x03c DispatchCode : [106] Uint4B
|
||
|
|
||
|
We retrieve in this structure, the SpinLock and the ServiceRoutine. Notice
|
||
|
that SynchronizeIrql contains the IRQL when the ISR will be executed.
|
||
|
|
||
|
For each entry in the IDT which handles a hardware interrupt, the
|
||
|
KiInterruptTemplate is contained in the DispatchCode table of the
|
||
|
KINTERRUPT structure.
|
||
|
|
||
|
For the keyboard device we have this KINTERRUPT :
|
||
|
|
||
|
kd> dt nt!_KINTERRUPT 80dd8130
|
||
|
+0x000 Type : 22
|
||
|
+0x002 Size : 484
|
||
|
+0x004 InterruptListEntry : _LIST_ENTRY [ 0x80dd8134 - 0x80dd8134 ]
|
||
|
+0x00c ServiceRoutine : 0xfa815495 unsigned char
|
||
|
->i8042prt!I8042KeyboardInterruptService+0
|
||
|
+0x010 ServiceContext : 0x80e2ec88
|
||
|
+0x014 SpinLock : 0
|
||
|
+0x018 TickCount : 0xffffffff
|
||
|
+0x01c ActualLock : 0x80e2ed48 -> 0
|
||
|
+0x020 DispatchAddress : 0x804da8d8 void nt!KiInterruptDispatch+0
|
||
|
+0x024 Vector : 0x31
|
||
|
+0x028 Irql : 0x1a ''
|
||
|
+0x029 SynchronizeIrql : 0x1a ''
|
||
|
+0x02a FloatingSave : 0 ''
|
||
|
+0x02b Connected : 0x1 ''
|
||
|
+0x02c Number : 0 ''
|
||
|
+0x02d ShareVector : 0 ''
|
||
|
+0x030 Mode : 1 ( Latched )
|
||
|
+0x034 ServiceCount : 0
|
||
|
+0x038 DispatchCount : 0xffffffff
|
||
|
+0x03c DispatchCode : [106] 0x56535554
|
||
|
|
||
|
Let's have a look at the beginning of KiInterruptTemplate :
|
||
|
|
||
|
nt!KiInterruptTemplate:
|
||
|
804da972 54 push esp
|
||
|
804da973 55 push ebp
|
||
|
804da974 53 push ebx
|
||
|
804da975 56 push esi
|
||
|
804da976 57 push edi
|
||
|
804da977 83ec54 sub esp,54h
|
||
|
804da97a 8bec mov ebp,esp
|
||
|
804da97c 89442444 mov dword ptr [esp+44h],eax
|
||
|
804da980 894c2440 mov dword ptr [esp+40h],ecx
|
||
|
804da984 8954243c mov dword ptr [esp+3Ch],edx
|
||
|
804da988 f744247000000200 test dword ptr [esp+70h],20000h
|
||
|
804da990 0f852a010000 jne nt!V86_kit_a (804daac0)
|
||
|
804da996 66837c246c08 cmp word ptr [esp+6Ch],8
|
||
|
804da99c 7423 je nt!KiInterruptTemplate+0x4f (804da9c1)
|
||
|
804da99e 8c642450 mov word ptr [esp+50h],fs
|
||
|
804da9a2 8c5c2438 mov word ptr [esp+38h],ds
|
||
|
804da9a6 8c442434 mov word ptr [esp+34h],es
|
||
|
804da9aa 8c6c2430 mov word ptr [esp+30h],gs
|
||
|
804da9ae bb30000000 mov ebx,30h
|
||
|
804da9b3 b823000000 mov eax,23h
|
||
|
804da9b8 668ee3 mov fs,bx
|
||
|
804da9bb 668ed8 mov ds,ax
|
||
|
804da9be 668ec0 mov es,ax
|
||
|
804da9c1 648b1d00000000 mov ebx,dword ptr fs:[0]
|
||
|
804da9c8 64c70500000000ffffffff mov dword ptr fs:[0],0FFFFFFFFh
|
||
|
804da9d3 895c244c mov dword ptr [esp+4Ch],ebx
|
||
|
804da9d7 81fc00000100 cmp esp,10000h
|
||
|
804da9dd 0f82b5000000 jb nt!Abios_kit_a (804daa98)
|
||
|
804da9e3 c744246400000000 mov dword ptr [esp+64h],0
|
||
|
804da9eb fc cld
|
||
|
804da9ec 8b5d60 mov ebx,dword ptr [ebp+60h]
|
||
|
804da9ef 8b7d68 mov edi,dword ptr [ebp+68h]
|
||
|
804da9f2 89550c mov dword ptr [ebp+0Ch],edx
|
||
|
804da9f5 c74508000ddbba mov dword ptr [ebp+8],0BADB0D00h
|
||
|
804da9fc 895d00 mov dword ptr [ebp],ebx
|
||
|
804da9ff 897d04 mov dword ptr [ebp+4],edi
|
||
|
804daa02 f60550f0dfffff test byte ptr ds:[0FFDFF050h],0FFh
|
||
|
804daa09 750d jne nt!Dr_kit_a (804daa18)
|
||
|
|
||
|
nt!KiInterruptTemplate2ndDispatch:
|
||
|
804daa0b bf00000000 mov edi,0
|
||
|
nt!KiInterruptTemplateObject:
|
||
|
804daa10 e9c3fcffff jmp nt!KeSynchronizeExecution+0x2 (804da6d8)
|
||
|
[...]
|
||
|
|
||
|
Remember, this code is unique for each KINTERRUPT. We said before that
|
||
|
KiInterruptDispatch receives its arguments from the EDI register (a
|
||
|
pointer to the KINTERRUPT of the interrupt). In the KiInterruptTemplate
|
||
|
we can see this little code :
|
||
|
|
||
|
[...]
|
||
|
nt!KiInterruptTemplate2ndDispatch:
|
||
|
804daa0b bf00000000 mov edi,0
|
||
|
nt!KiInterruptTemplateObject:
|
||
|
804daa10 e9c3fcffff jmp nt!KeSynchronizeExecution+0x2 (804da6d8)
|
||
|
[...]
|
||
|
|
||
|
Here we have a mov "edi, 0" and a jmp, but if we look at the
|
||
|
KiInterruptTemplate code contained in the keyboard's KINTERRUPT we have :
|
||
|
|
||
|
ffb72525 bf5024b7ff mov edi,0FFB72450h ; Keyboard KINTERRUPT
|
||
|
ffb7252a e9a9839680 jmp nt!KiInterruptDispatch (804da8d8)
|
||
|
|
||
|
Wow, instructions are modified! The kernel will dynamically changes those
|
||
|
2 instructions in the KiInterruptTemplate code. In EDI we find the
|
||
|
KINTERRUPT object and the jmp branch on KiInterruptDispatch.
|
||
|
|
||
|
Why this implementation ? Because we can easily change the dispatch
|
||
|
handler. Even if we often have the KiInterruptDispatch we can find
|
||
|
KiFloatingDispatch or KiChainDispatch. KiChainedDispatch is for vectors
|
||
|
shared among multiple interrupt objects and KiFloatingDispatch is like
|
||
|
KiInterruptDispatch, but it saves the floating core state too.
|
||
|
|
||
|
Windows provides APIs for connecting interrupts on IDT. IoConnectInterrupt
|
||
|
and IoConnectInterruptEx, according to the WDK :
|
||
|
|
||
|
NTSTATUS
|
||
|
IoConnectInterrupt(
|
||
|
OUT PKINTERRUPT *InterruptObject,
|
||
|
IN PKSERVICE_ROUTINE ServiceRoutine,
|
||
|
IN PVOID ServiceContext,
|
||
|
IN PKSPIN_LOCK SpinLock OPTIONAL,
|
||
|
IN ULONG Vector,
|
||
|
IN KIRQL Irql,
|
||
|
IN KIRQL SynchronizeIrql,
|
||
|
IN KINTERRUPT_MODE InterruptMode,
|
||
|
IN BOOLEAN ShareVector,
|
||
|
IN KAFFINITY ProcessorEnableMask,
|
||
|
IN BOOLEAN FloatingSave
|
||
|
);
|
||
|
|
||
|
As you can see IoConnectInterrupt returns in the InterruptObject parameter
|
||
|
a KINTERRUPT structure, the same that we retrieve in the IDT. Previously
|
||
|
you have seen in the KiInterruptTemplate two labels,
|
||
|
KiInterruptTemplateObject and KiInterruptTemplate2ndDispatch. Those two
|
||
|
labels are used by kernel function to find the two instructions in the
|
||
|
KiInterruptTemplateRoutine. KeInitializeInterrupt uses the
|
||
|
KiInterruptTemplateObject label to update the "jmp Ki*Dispatch" and the
|
||
|
KiConnectVectorAndInterruptObject function uses
|
||
|
KiInterruptTemplate2ndDispatch to modify the "mov edi, <&Kinterrupt>".
|
||
|
|
||
|
-----[ 2.3.2 Hooking hardware IDT like a ninja
|
||
|
|
||
|
Now, think about this. We want to hook the IDT in a stealth way, we know
|
||
|
that replacing an entry directly is not the best solution. Anti-rooktits
|
||
|
don't check the dynamically allocated KiInterruptTemplate routine. So we
|
||
|
can modify this routine as we wish. There are three possible ways :
|
||
|
|
||
|
- Redirect the "jmp Ki*Dispatch" on our dispatch routine, we have to code
|
||
|
our dispatch routine, not so hard.
|
||
|
|
||
|
- Change the kinterrupt address passed in EDI by the instruction
|
||
|
"mov edi, <&Kinterrupt>". The new KINTERRUPT will be the same than the
|
||
|
previous one, only the ServiceRoutine will be modified by us.
|
||
|
|
||
|
- Create our own KiInterruptTemplate, hard ...
|
||
|
|
||
|
In this paper, we choosed the simplest way. We change the
|
||
|
"mov edi, <&kinterrupt>" by a "mov edi, <&OurKinterrupt>" and we implement
|
||
|
our ServiceRoutine. We know that this instruction is followed by a jmp, so
|
||
|
with a disassembly engine we can retrieve the instruction before the jmp
|
||
|
nt!KiInterruptDispatch and modify it. We must keep in mind, when the
|
||
|
ServiceRoutine is running, the interrupt is not handled yet and we are
|
||
|
running at DEVICE_IRQL IRQL. This is not a fair situation, because a
|
||
|
lot of Windows kernel functions are not accessible. We know, that most
|
||
|
ISR queued a DPC, so after the ISR has been executed, the last entry in
|
||
|
the current core DPC queue should contain the DPC routine of our interrupt.
|
||
|
|
||
|
If we want to access data generated by the interrupt we must proceed like
|
||
|
the ISR. Replacing the original ISR by our own ISR handler is very hard,
|
||
|
because it depends too much on the hardware device. But we know that the
|
||
|
real I/O is done by the DPC, so when KiInterruptTemplate will call our
|
||
|
ServiceRoutine, first we call the original ServiceRoutine and we modify
|
||
|
the last DPC entry by our.
|
||
|
|
||
|
DPC are represented by KDPC structures :
|
||
|
|
||
|
kd> dt nt!_KDPC
|
||
|
+0x000 Type : Int2B
|
||
|
+0x002 Number : UChar
|
||
|
+0x003 Importance : UChar
|
||
|
+0x004 DpcListEntry : _LIST_ENTRY
|
||
|
+0x00c DeferredRoutine : Ptr32 void
|
||
|
+0x010 DeferredContext : Ptr32 Void
|
||
|
+0x014 SystemArgument1 : Ptr32 Void
|
||
|
+0x018 SystemArgument2 : Ptr32 Void
|
||
|
+0x01c Lock : Ptr32 Uint4B
|
||
|
|
||
|
DPC list can be found in the KPRCB (Kernel Processor Control Region Block)
|
||
|
structure of the current processor. KPRCB is preceded by a KPCR (Kernel
|
||
|
Processor Control Block) structure which is located at FS:[0x1C] on the
|
||
|
current processor. KPRCB is a 0x120 bytes from the beginning of the KPCR
|
||
|
structure.
|
||
|
|
||
|
dt nt!_KPRCB
|
||
|
[...]
|
||
|
+0x860 DpcListHead : _LIST_ENTRY
|
||
|
+0x868 DpcStack : Ptr32 Void ; DPC arguments
|
||
|
+0x86c DpcCount : Uint4B ; DPC core counter
|
||
|
+0x870 DpcQueueDepth : Uint4B ; Numbers of DPC in the list
|
||
|
+0x874 DpcRoutineActive : Uint4B
|
||
|
+0x878 DpcInterruptRequested : Uint4B
|
||
|
+0x87c DpcLastCount : Uint4B
|
||
|
+0x880 DpcRequestRate : Uint4B
|
||
|
+0x884 MaximumDpcQueueDepth : Uint4B
|
||
|
+0x888 MinimumDpcRate : Uint4B
|
||
|
|
||
|
Now we know how to retrieve the DPC of our interrupt, we can easily
|
||
|
change it to our own and handle the data.
|
||
|
|
||
|
For the keyboard the DPC is queued by KeInsertQueueDpc in the
|
||
|
I8xQueueCurrentKeyboardInput routine called by the keyboard's ISR.
|
||
|
|
||
|
kd> dt nt!_KDPC 80e3461c
|
||
|
+0x000 Type : 19 ; 19=DpcObject
|
||
|
+0x002 Number : 0 ''
|
||
|
+0x003 Importance : 0x1 ''
|
||
|
+0x004 DpcListEntry : _LIST_ENTRY [ 0xffdff980 - 0x80559684 ]
|
||
|
+0x00c DeferredRoutine : 0xfa815650 void i8042prt!I8042KeyboardIsrDpc
|
||
|
+0x010 DeferredContext : 0x80e343b8
|
||
|
+0x014 SystemArgument1 : (null)
|
||
|
+0x018 SystemArgument2 : (null)
|
||
|
+0x01c Lock : 0xffdff9c0 -> 0
|
||
|
|
||
|
Here is the figure of the attack :
|
||
|
|
||
|
MyKinterrupt structure
|
||
|
+---------------------+
|
||
|
Hardware Interrupt /----> MyServiceRoutine |
|
||
|
| | | Calls the original |
|
||
|
| | | ISR ------\
|
||
|
\---> IDT ---\ | | And modify the DPC | |
|
||
|
| | | queue. | |
|
||
|
| | +---------------------+ |
|
||
|
+---------------------+ | |
|
||
|
| KiInterruptTemplate -----/ Original Kinterrupt |
|
||
|
+---------------------+ +---------------------+ |
|
||
|
Core | | |
|
||
|
+------------+ | ServiceRoutine <-----/
|
||
|
| | | Queues the ISR's DPC|
|
||
|
|DpcListHead |--\ +---------------------+
|
||
|
| | |
|
||
|
+------------+ |
|
||
|
| +-----+ +-----+ +-----+ +-----+
|
||
|
\-> DPC |---->| DPC |---->| DPC |---->| DPC |-->DpcListHead
|
||
|
DpcListHead<---| |<----| |<----| |<----| |
|
||
|
+-----+ +-----+ +-----+ +-----+
|
||
|
/\
|
||
|
||
|
||
|
Last DPC entry
|
||
|
Modified after the call
|
||
|
to the ServiceRoutine.
|
||
|
|
||
|
-----[ 2.3.3 - Application 1 : Kernel keylogger
|
||
|
|
||
|
It's time to design a POC. In this sample we will see how to sniff
|
||
|
keyboard keystrokes. As you see previously, we are now able to control the
|
||
|
DPC generated by an interrupt. For the keyboard we will hijack the
|
||
|
I8042KeyboardIsrDpc routine which is set into the DPC's keyboard
|
||
|
interruption. With our own DPC handler we will reproduce the behavior of
|
||
|
the original routine, unfortunately this kind of routine is hard to write
|
||
|
so we ripped some pieces of codes and used reversing techniques (notice the
|
||
|
lazy hacker style).
|
||
|
|
||
|
In our DPC handler we must call the KeyboardClassServiceCallback [28]
|
||
|
routine, this routine is provided by the Kbdclass driver. This callback
|
||
|
transfers input data buffer of a device to the class data queue. A function
|
||
|
keyboard driver must calls this class service callback in its DPC routine.
|
||
|
Here is the KeyboardClassServiceCallback's prototype :
|
||
|
|
||
|
VOID
|
||
|
KeyboardClassServiceCallback (
|
||
|
IN PDEVICE_OBJECT DeviceObject,
|
||
|
IN PKEYBOARD_INPUT_DATA InputDataStart,
|
||
|
IN PKEYBOARD_INPUT_DATA InputDataEnd,
|
||
|
IN OUT PULONG InputDataConsumed
|
||
|
);
|
||
|
|
||
|
Parameters :
|
||
|
DeviceObject : Pointer to the class device object.
|
||
|
|
||
|
InputDataStart : Pointer to the first keyboard input data packet in
|
||
|
the input data buffer of the port device.
|
||
|
|
||
|
InputDataEnd : Pointer to the keyboard input data packet that
|
||
|
immediately follows the last data packet in the input data buffer of
|
||
|
the port device.
|
||
|
|
||
|
InputDataConsumed : Pointer to the number of keyboard input data
|
||
|
packets that are transferred by the routine.
|
||
|
|
||
|
KEYBOARD_INPUT_DATA is defined by :
|
||
|
|
||
|
typedef struct _KEYBOARD_INPUT_DATA {
|
||
|
USHORT UnitId;
|
||
|
USHORT MakeCode;
|
||
|
USHORT Flags;
|
||
|
USHORT Reserved;
|
||
|
ULONG ExtraInformation;
|
||
|
} KEYBOARD_INPUT_DATA, *PKEYBOARD_INPUT_DATA;
|
||
|
|
||
|
So in our DPC handler we just have to check the MakeCode member of the set
|
||
|
of KEYBOARD_INPUT_DATA structures. The MakeCode (or scancode) represents
|
||
|
the data sent by the keyboard to the system when you hit or release a
|
||
|
key, each key has it's own scancode and the system usually translates the
|
||
|
scancode into a character depending on you code page. For example the
|
||
|
scancode 19d on classical US keyboard is translated into the keycode 'e'.
|
||
|
|
||
|
In order to know if CAPSLOCK is activated we send an IOCTL to the
|
||
|
functional keyboard device but we can only send IOCTL at a PASSIVE_LEVEL
|
||
|
IRQL. For that we use a system thread which will sent IOCTL with the
|
||
|
kernel API IoBuildDeviceIoControlRequest. In fact the scancodes are queued
|
||
|
in a list locked by a spinlock and thread synchronized with a semaphore.
|
||
|
The thread is listening to incoming keystrokes then converts scancodes into
|
||
|
keycodes. Like the kernel keylogger Klog does [29].
|
||
|
|
||
|
-----[ 2.3.4 - Application 2 : NDIS packet sniffer
|
||
|
|
||
|
In the same way, an interrupt is raised when your network card receives
|
||
|
a packet. When this kind of interrupt is raised NDIS ISR handler
|
||
|
(ndisMIsr) routine launches the miniport ISR interrupt handler. The
|
||
|
ndisMIsr routine is used as a wrapper for miniport ISR and DPC. You can
|
||
|
see in the IDT the following entry :
|
||
|
|
||
|
3b: 80df6414 NDIS!ndisMIsr (KINTERRUPT 80df63d8)
|
||
|
|
||
|
It means, your ISR handler is not called directly when an interrupt
|
||
|
occurs, it is the ndisMIsr routine. Miniport's ISR is called by ndisMIsr
|
||
|
and the miniport DPC is also queued in this routine. The DPC queued is
|
||
|
the ndisMDpc routine which wraps your own DPC miniport handler. Finally
|
||
|
NDIS wraps all the interrupt process with ndisMIsr and ndisMDpc routines on
|
||
|
Windows XP with NDIS 5.1. We don't know if this implementation is still
|
||
|
present on Windows Vista with NDIS 6.0.
|
||
|
|
||
|
We know we can hijack the ndisMDpc handler by our own handler. With NDIS
|
||
|
we will proceed in the same way but we will not hook the MiniportDpc
|
||
|
routine but directly hook the ndisMDpc routine. Why? Because we know that
|
||
|
ndisMDpc wraps the MiniportDpc routine and in fact MiniportDpc depends too
|
||
|
much on the hardware of the miniport device. Each miniport device is
|
||
|
represented by an NDIS_MINIPORT_BLOCK [30] structure, in this structure
|
||
|
we find a reference to a NDIS_MINIPORT_INTERRUP structure, which looks
|
||
|
like :
|
||
|
|
||
|
kd> dt ndis!_NDIS_MINIPORT_INTERRUPT
|
||
|
+0x000 InterruptObject : Ptr32 _KINTERRUPT
|
||
|
+0x004 DpcCountLock : Uint4B
|
||
|
+0x008 Reserved : Ptr32 Void
|
||
|
+0x00c MiniportIsr : Ptr32 Void
|
||
|
+0x010 MiniportDpc : Ptr32 Void
|
||
|
+0x014 InterruptDpc : _KDPC
|
||
|
+0x034 Miniport : Ptr32 _NDIS_MINIPORT_BLOCK
|
||
|
+0x038 DpcCount : UChar
|
||
|
+0x039 Filler1 : UChar
|
||
|
+0x03c DpcsCompletedEvent : _KEVENT
|
||
|
+0x04c SharedInterrupt : UChar
|
||
|
+0x04d IsrRequested : UChar
|
||
|
|
||
|
If we look at the ndisMDpc routine we notice that only the first parameter
|
||
|
is used and this parameter refers to a NDIS_MINIPORT_INTERRUPT structure.
|
||
|
The ndisMDpc function will call the MiniportDpc field of this structure.
|
||
|
We just have to hijack this pointer by our routine in order to control the
|
||
|
incoming packets on the system.
|
||
|
|
||
|
The NDIS documentation specifies that a miniport DPC routine must
|
||
|
notify the bound protocol driver that an that an array of received
|
||
|
packets is available by calling the NdisMIndicateReceivePacket function
|
||
|
[31].
|
||
|
|
||
|
VOID
|
||
|
NdisMIndicateReceivePacket(
|
||
|
IN NDIS_HANDLE MiniportAdapterHandle,
|
||
|
IN PPNDIS_PACKET ReceivePackets,
|
||
|
IN UINT NumberOfPackets
|
||
|
);
|
||
|
|
||
|
In the ndis.h header we have :
|
||
|
#define NdisMIndicateReceivePacket(_H, _P, _N) \
|
||
|
{ \
|
||
|
(*((PNDIS_MINIPORT_BLOCK)(_H))->PacketIndicateHandler)( \
|
||
|
_H, \
|
||
|
_P, \
|
||
|
_N); \
|
||
|
}
|
||
|
|
||
|
So in our MiniportDpc routine we will hihjack the PacketIndicateHandler,
|
||
|
which is often the ethFilterDprIndicateReceivePacket routine in the
|
||
|
NDIS_MINIPORT_BLOCK structure, in order to filter the incoming packets on
|
||
|
the miniport. After we have hijacked this pointer we call the original
|
||
|
MiniportDpc routine that will process everything. After that, we restore
|
||
|
the PacketIndicateHandler handler in the NDIS_MINIPORT_BLOCK for stealth
|
||
|
reasons. To sum up we must :
|
||
|
|
||
|
- Hijack the routine into the DPC queued by the ndisMIsr routine.
|
||
|
|
||
|
- Now that we have hijacked the ndisMDpc we modify the
|
||
|
PacketIndicateHandler into the NDIS_MINIPORT_BLOCK of the miniport.
|
||
|
|
||
|
- We call the ndisMDpc routine. It will call the original MiniportDpc
|
||
|
handler
|
||
|
|
||
|
- The MiniportDpc routine calls the NdisMIndicateReceivePacket macro. Our
|
||
|
filter function is called and we do our job.
|
||
|
|
||
|
- When the ndisMDpc returns we restore the original PacketIndicateHandler
|
||
|
into the NDIS_MINIPORT_BLOCK of the miniport.
|
||
|
|
||
|
With this filter, we can monitor or modify the incoming packets. For
|
||
|
example, our PacketIndicateHandler hook can search in the incoming packets
|
||
|
for a tag, when this tag his found the rootkit triggers a function.
|
||
|
|
||
|
---[ 2.2 - Conclusion about stealth hooking on IDT
|
||
|
|
||
|
In this part we have seen how Windows manages his hardware interrupts by
|
||
|
using a global template function dedicated to all interrupts. The fact
|
||
|
that this template routine his forged for each interrupts is the main
|
||
|
point of this attack, with that we can create a fake template routine that
|
||
|
cannot be detected directly. The stealth of our attack remains on two
|
||
|
points :
|
||
|
|
||
|
- We modify only dynamic allocated and forged code
|
||
|
|
||
|
- We hijack highly temporal dynamic allocated structures which when
|
||
|
running, are always preempting the core.
|
||
|
|
||
|
So, even if the scope of our attack is restricted, controlling the hardware
|
||
|
is the best way for a rk to reach critical components. Finally, we have
|
||
|
just cheated the system with its own features and that's the purpose of a
|
||
|
stealth rootkit.
|
||
|
|
||
|
--[ 3 - Owning NonPaged pool using stealth hooking
|
||
|
|
||
|
Rootkit sophistication depends on how it subverts the kernel. More
|
||
|
complex techniques come out as kernel and hardware understanding evolve.
|
||
|
Nowadays there is so many ways to subvert the kernel, in consequence
|
||
|
protections become harder to defeat. We're going to present a different
|
||
|
means to gain control. Next techniques apply this approach to the kernel
|
||
|
memory allocator.
|
||
|
|
||
|
Our goal is getting execution on every NonPaged allocation without using
|
||
|
any hook. It must bypass any hooking verification even those based on code
|
||
|
page comparison or hashing. It will be done by modifying data used by the
|
||
|
allocator. We just apply the concept of using code against itself. We do
|
||
|
believe that this concept can be used on others components and in
|
||
|
different ways successfully.
|
||
|
|
||
|
We won't try to convince you that this technique is perfect. It evades
|
||
|
current protections and detection systems. The more important is that they
|
||
|
would need more than a simple modification to prevent and block an attack
|
||
|
based on kernel code behavior.
|
||
|
|
||
|
---[ 3.1 - Kernel allocation layout review
|
||
|
|
||
|
As every operating system, Windows kernel puts forward some functions in
|
||
|
order to allocate or free memory. Virtual memory is organized as block of
|
||
|
memory called pages. In Intel x86 architecture, a page size is 4096 bytes
|
||
|
and most allocations requests are smaller. Thus, kernel functions like
|
||
|
ExAllocatePoolWithTag and ExFreePoolWithTag kept unused memory blocks for
|
||
|
next allocations. Internal functions directly interact with hardware each
|
||
|
time a page is needed. All those procedures are complex and delicate that's
|
||
|
why drivers trust kernel implementation.
|
||
|
|
||
|
-----[ 3.1.1 - Difference between Paged and NonPaged pool
|
||
|
|
||
|
Kernel system memory is divided in two different kind of pool. It has been
|
||
|
separated to distinguish most used memory blocks. The system must know
|
||
|
which pages should be resident and which can be temporarily discarded. The
|
||
|
page fault handler restores pageable memory only when IRQL is inferior of
|
||
|
DPC or DISPATCH level. Paged pool can be paged in or out of the system. A
|
||
|
memory block paged out will be saved on the file system and so unused part
|
||
|
of paged memory will not be resident in memory. NonPaged pool is present
|
||
|
in every IRQL level and then is put-upon for important tasks.
|
||
|
|
||
|
The file pagefile.sys contains paged out memory. It was attacked to inject
|
||
|
unsigned code into Vista kernel [32]. Some solutions was discussed as
|
||
|
disabling kernel memory paging. Joanna Rutkowska defended this solution as
|
||
|
more secure than others but with a small physical memory loss. Microsoft
|
||
|
just denied raw disk access, which may prove that Paged and NonPaged
|
||
|
layout is an important feature of Windows kernel [33].
|
||
|
|
||
|
This article focuses on NonPaged pool layout as PagedPool handling is
|
||
|
totally different. NonPaged pool can be more or less considered as
|
||
|
following a typical heap implementation. Global information about system
|
||
|
pool can be found in Microsoft Windows Internals [34].
|
||
|
|
||
|
-----[ 3.1.2 - NonPaged pool tables
|
||
|
|
||
|
The allocation algorithm must be fast allocating on the most used sizes.
|
||
|
That why three different tables exist and each one is devoted to a size
|
||
|
range. We found this organization in most memory management algorithms.
|
||
|
Retrieving memory blocks from hardware takes time. Windows balances between
|
||
|
response faster and avoid memory wasting. Response time becomes faster if
|
||
|
memory blocks are stored for next allocations. In the other hand, if you
|
||
|
keep too much memory, it can penalize memory demands.
|
||
|
|
||
|
Each table implements a different way to store memory blocks. We will
|
||
|
present each table and where you can find them.
|
||
|
|
||
|
The NonPaged lookaside is a per-processor table covering size inferior or
|
||
|
equal to 256 bytes. Each processor has a processor control register (PCR)
|
||
|
storing data concerning only a single processor like IRQL level, GDT, IDT.
|
||
|
Its extension called processor control region (PCRB) contains lookasides
|
||
|
tables. Next windbg dump presents NonPaged lookaside table and its
|
||
|
structure.
|
||
|
|
||
|
kd> !pcr
|
||
|
KPCR for Processor 0 at ffdff000:
|
||
|
Major 1 Minor 1
|
||
|
NtTib.ExceptionList: 805486b0
|
||
|
NtTib.StackBase: 80548ef0
|
||
|
NtTib.StackLimit: 80546100
|
||
|
NtTib.SubSystemTib: 00000000
|
||
|
NtTib.Version: 00000000
|
||
|
NtTib.UserPointer: 00000000
|
||
|
NtTib.SelfTib: 00000000
|
||
|
|
||
|
SelfPcr: ffdff000
|
||
|
Prcb: ffdff120
|
||
|
Irql: 00000000
|
||
|
IRR: 00000000
|
||
|
IDR: ffffffff
|
||
|
InterruptMode: 00000000
|
||
|
IDT: 8003f400
|
||
|
GDT: 8003f000
|
||
|
TSS: 80042000
|
||
|
|
||
|
CurrentThread: 80551920
|
||
|
NextThread: 00000000
|
||
|
IdleThread: 80551920
|
||
|
|
||
|
DpcQueue: 0x80551f80 0x804ff29c
|
||
|
kd> dt nt!_KPRCB ffdff120
|
||
|
[...]
|
||
|
+0x5a0 PPNPagedLookasideList : [32]
|
||
|
+0x000 P : 0x819c6000 _GENERAL_LOOKASIDE
|
||
|
+0x004 L : 0x8054dd00 _GENERAL_LOOKASIDE
|
||
|
[...]
|
||
|
kd> dt nt!_GENERAL_LOOKASIDE
|
||
|
+0x000 ListHead : _SLIST_HEADER
|
||
|
+0x008 Depth : Uint2B
|
||
|
+0x00a MaximumDepth : Uint2B
|
||
|
+0x00c TotalAllocates : Uint4B
|
||
|
+0x010 AllocateMisses : Uint4B
|
||
|
+0x010 AllocateHits : Uint4B
|
||
|
+0x014 TotalFrees : Uint4B
|
||
|
+0x018 FreeMisses : Uint4B
|
||
|
+0x018 FreeHits : Uint4B
|
||
|
+0x01c Type : _POOL_TYPE
|
||
|
+0x020 Tag : Uint4B
|
||
|
+0x024 Size : Uint4B
|
||
|
+0x028 Allocate : Ptr32 void*
|
||
|
+0x02c Free : Ptr32 void
|
||
|
+0x030 ListEntry : _LIST_ENTRY
|
||
|
+0x038 LastTotalAllocates : Uint4B
|
||
|
+0x03c LastAllocateMisses : Uint4B
|
||
|
+0x03c LastAllocateHits : Uint4B
|
||
|
+0x040 Future : [2] Uint4B
|
||
|
|
||
|
Lookaside tables permit faster block retrieving than typical double linked
|
||
|
list. For this optimization lock time is really important and a single
|
||
|
linked list is a faster mechanism than software locking.
|
||
|
ExInterlockedPopEntrySList function is used to pop an entry from a single
|
||
|
linked list using hardware locking instruction "lock".
|
||
|
|
||
|
PPNPagedLookasideList is the lookaside table we were talking about. It
|
||
|
contains two lookaside lists P and L. Depth field of the GENERAL_LOOKASIDE
|
||
|
structure defines how many entries can be in ListHead single list. The
|
||
|
system updates regularly the depth using different counters. The update
|
||
|
algorithm is based on processor number and is different for P and L. Depth
|
||
|
of the P list is updated more frequently than L list as it optimizes
|
||
|
performances on very small blocks.
|
||
|
|
||
|
The second table depends how many processors are used and how system
|
||
|
managed them. Allocation system walk it if size is inferior or equal to
|
||
|
4080 bytes or if lookaside research failed. Even if target table can
|
||
|
change, it always has the same POOL_DESCRIPTOR structure. On single
|
||
|
processor, a variable called PoolVector is used to retrieve
|
||
|
NonPagedPoolDescriptor pointer. On multi processor, the
|
||
|
ExpNonPagedPoolDescriptor table has 16 slots containing pool descriptors.
|
||
|
Each processor PRCB points on a KNODE structure. A node can be linked on
|
||
|
more than one processor and contains a color field used as an index in
|
||
|
ExpNonPagedPoolDescriptor. Next figures illustrate this algorithm.
|
||
|
|
||
|
PoolVector
|
||
|
+------------+
|
||
|
| NonPaged | --------------> NonPagedPoolDescriptor
|
||
|
|------------+
|
||
|
| Paged |
|
||
|
+------------+
|
||
|
|
||
|
[ Figure 1 - Single processor pool descriptor ]
|
||
|
|
||
|
Processor #1
|
||
|
+------------+
|
||
|
| | ExpNonPagedPoolDescriptor
|
||
|
| PRCB ------\ +-------------------+
|
||
|
| | | /---------> SLOT #01 |
|
||
|
+------------+ | | | SLOT #02 |
|
||
|
/---------/ | | SLOT #03 |
|
||
|
| KNODE | | SLOT #04 |
|
||
|
|---> +------------+ | | SLOT #05 |
|
||
|
| | Proc mask | | | SLOT #06 |
|
||
|
| | color (01) --/ | SLOT #07 |
|
||
|
| | ... | | SLOT #08 |
|
||
|
| +------------+ | SLOT #09 |
|
||
|
| | SLOT #10 |
|
||
|
\---------\ | SLOT #11 |
|
||
|
Processor #2 | | SLOT #12 |
|
||
|
+------------+ | | SLOT #13 |
|
||
|
| | | | SLOT #14 |
|
||
|
| PRCB ------/ | SLOT #15 |
|
||
|
| | | SLOT #16 |
|
||
|
+------------+ +-------------------+
|
||
|
|
||
|
[ Figure 2 - Multiple processor pool descriptor ]
|
||
|
|
||
|
A global variable ExpNumberOfNonPagedPools defines if multi processor case
|
||
|
is used. It should reflect processor number but it can change between
|
||
|
operating system versions.
|
||
|
|
||
|
The next dump shows POOL_DESCRIPTOR structure from windbg.
|
||
|
|
||
|
kd> dt nt!_POOL_DESCRIPTOR
|
||
|
+0x000 PoolType : _POOL_TYPE
|
||
|
+0x004 PoolIndex : Uint4B
|
||
|
+0x008 RunningAllocs : Uint4B
|
||
|
+0x00c RunningDeAllocs : Uint4B
|
||
|
+0x010 TotalPages : Uint4B
|
||
|
+0x014 TotalBigPages : Uint4B
|
||
|
+0x018 Threshold : Uint4B
|
||
|
+0x01c LockAddress : Ptr32 Void
|
||
|
+0x020 PendingFrees : Ptr32 Void
|
||
|
+0x024 PendingFreeDepth : Int4B
|
||
|
+0x028 ListHeads : [512] _LIST_ENTRY
|
||
|
|
||
|
Queued spinlock synchronization, part of HAL library, is used to restrict
|
||
|
concurrency on a pool descriptor. It assures that only one thread and one
|
||
|
processor will access and unlink an entry from a pool descriptor. HAL
|
||
|
library changes on different architectures and what is a simple IRQL
|
||
|
raising on single processor becomes a more complex queued system on
|
||
|
multi-processor. For default pool descriptor, general NonPaged queued
|
||
|
spinlock is locked (LockQueueNonPagedPoolLock). Else, a custom queued
|
||
|
spinlock is created.
|
||
|
|
||
|
The third and last table is shared by processors for size superior of 4080
|
||
|
bytes. MmNonPagedPoolFreeListHead is also used when others tables lack
|
||
|
memory. It composed by 4 LIST_ENTRY each one representing a page number,
|
||
|
except for the last one which holds all superiors pages kept by the system.
|
||
|
Access to this table is guarded by general non paged queued spinlock
|
||
|
also called LockQueueNonPagedPoolLock. During the free procedure of a
|
||
|
smaller block, ExFreePoolWithTag merges it with previous and next free
|
||
|
blocks. It can create a block superior or equal to 1 page. In this case,
|
||
|
the new block is added in the MmNonPagedPoolFreeListHead table.
|
||
|
|
||
|
-----[ 3.1.3 - Allocation and free algorithms
|
||
|
|
||
|
Kernel allocation does not change that much between OS versions but its
|
||
|
algorithm is as hard as the userland heap one. In this part, we want to
|
||
|
illustrate basic behavior between tables during allocation or free
|
||
|
procedures. A lot of details have been thrown away such as synchronization
|
||
|
mechanisms. Those algorithms will help you for the technique explanation
|
||
|
but also understanding the basic elements of kernel allocation. Despite
|
||
|
kernel exploitation is not part of this paper, pool overflow is an
|
||
|
interesting topic that needs understanding of some part of this algorithm.
|
||
|
|
||
|
NonPaged pool allocation algorithm (ExAllocatePoolWithTag):
|
||
|
|
||
|
IF [ Size > 4080 bytes ]
|
||
|
[
|
||
|
- Call the MiAllocatePoolPages function
|
||
|
- Walk MmNonPagedPoolFreeListHead LIST_ENTRY table.
|
||
|
- Retrieve memory from hardware if necessary.
|
||
|
- Return memory page aligned (without header).
|
||
|
]
|
||
|
|
||
|
IF [ Size <= 256 bytes ]
|
||
|
[
|
||
|
- Pop entry from PPNPagedLookasideList table.
|
||
|
- If something is found return memory block.
|
||
|
]
|
||
|
|
||
|
IF [ ExpNumberOfNonPagedPools > 1 ]
|
||
|
- PoolDescriptor from ExpNumberOfNonPagedPools and used index
|
||
|
comes from PRCB KNODE color.
|
||
|
ELSE
|
||
|
- PoolDescriptor is PoolVector first entry, designed by symbol
|
||
|
as NonPagedPoolDescriptor.
|
||
|
|
||
|
FOREACH [ >= Size entry of PoolDescriptor.ListHeads ]
|
||
|
[
|
||
|
IF [ Entry is not empty ]
|
||
|
[
|
||
|
- Unlink entry and split it if needed
|
||
|
- Return memory block
|
||
|
]
|
||
|
]
|
||
|
|
||
|
- Call the MiAllocatePoolPages function
|
||
|
- Walk MmNonPagedPoolFreeListHead LIST_ENTRY table..
|
||
|
- Split it correctly to the right size
|
||
|
- Return new memory block
|
||
|
|
||
|
NonPaged pool free algorithm (ExFreePoolWithTag) :
|
||
|
|
||
|
IF [ MemoryBlock is page aligned ]
|
||
|
[
|
||
|
- Call the MiFreePoolPages function
|
||
|
- Determine block type (Paged or NonPaged)
|
||
|
- Depending on how many blocks are kept in
|
||
|
MmNonPagedPoolFreeListHead, we release it to the hardware.
|
||
|
]
|
||
|
ELSE
|
||
|
[
|
||
|
- Merge previous and next block if possible
|
||
|
|
||
|
IF [ NewMemoryBlock size <= 256 bytes ]
|
||
|
[
|
||
|
- Look at PPNPagedLookasideList entry depth and see if we
|
||
|
should keep it.
|
||
|
- We return if memory block is pushed into lookaside list
|
||
|
]
|
||
|
|
||
|
IF [ NewMemoryBlock size <= 4080 bytes ]
|
||
|
[
|
||
|
- Use POOL_HEADER PoolIndex variable to determine which
|
||
|
PoolDescriptor must be used.
|
||
|
- Insert it in the proper LIST_ENTRY array entry
|
||
|
- If anything goes well, return
|
||
|
]
|
||
|
|
||
|
- Depending on how many blocks are kept in
|
||
|
MmNonPagedPoolFreeListHead, we release it to the hardware.
|
||
|
]
|
||
|
|
||
|
Paged pool algorithm is very different especially for page aligned blocks.
|
||
|
Smaller size management should be not that far from NonPaged but in
|
||
|
assembly code we definitely saw that NonPaged and Paged pool are totally
|
||
|
separated. Once you know a little more about how NonPaged allocation works,
|
||
|
we can now talk about exploitation part.
|
||
|
|
||
|
---[ 3.2 - Getting code execution abusing allocation code
|
||
|
|
||
|
Our main goal is getting code execution on every allocation attempts for
|
||
|
NonPaged pool only. This result must be done only by changing data used by
|
||
|
targeted code. Our purpose is proving that kernel code can serve our
|
||
|
interest only by changing typical data environment. Our work is based
|
||
|
on a new rootkit developed to gain control over NonPaged allocation.
|
||
|
|
||
|
We start with getting code execution for allocation superior or equal
|
||
|
to 1 page. As we saw on previous part, it concerns the third and last
|
||
|
table.
|
||
|
|
||
|
-----[ 3.2.1 - Data corruption of MmNonPagedPoolFreeListHead
|
||
|
|
||
|
MmNonPagedPoolFreeListHead conserves page aligned memory blocks to speed up
|
||
|
memory allocation. It links held memory block using a LIST_ENTRY structure.
|
||
|
This structure is common and use in Windows heap library for example.
|
||
|
|
||
|
kd> dt nt!_LIST_ENTRY
|
||
|
+0x000 Flink : Ptr32 _LIST_ENTRY
|
||
|
+0x004 Blink : Ptr32 _LIST_ENTRY
|
||
|
|
||
|
MmNonPagedPoolFreeListHead access is protected by general NonPaged queued
|
||
|
spinlock LockQueueNonPagedPoolLock. It assures that only one thread and
|
||
|
processor can look and modify this structure.
|
||
|
|
||
|
So we need a way to get control over allocation and unlinking procedure
|
||
|
seems perfect. We can poison this linked list with a fake entry, with the
|
||
|
highest size possible, which unlinking will modify current executed code.
|
||
|
At kernel level, you can modify code as data without any protection issues.
|
||
|
Unlinking was used when heap exploitation started [20] but modifying code
|
||
|
was not possible from userland. As spinlock assures us exclusivity, there
|
||
|
is no risk on some race condition. The created "hook" would be dynamic and
|
||
|
code restored directly. Page guard protection reverse [18] shows that code
|
||
|
is only checked every 5 minutes. Whether a modification is found, real code
|
||
|
is just replaced.
|
||
|
|
||
|
This method has plenty assets but also a lot of obstacles. Let start by
|
||
|
enumerating all those obstacles :
|
||
|
|
||
|
- On a basic implementation of unlinking, list become unwalkable.
|
||
|
It breaks most utilization of the table.
|
||
|
- Pass through page cleaning methods and always be the first block on the
|
||
|
list otherwise we could miss some call.
|
||
|
- We break code path and sooner or later we must return as if our
|
||
|
hijacking has never been there and everything goes fine.
|
||
|
- Processor prefetch make self code modification dangerous.
|
||
|
|
||
|
Unlinking gives us 4 bytes overwriting to build an opcode and create a
|
||
|
redirection. In our case, we influenced current context and a register
|
||
|
should point to the unlinked entry. We said should point without choosing a
|
||
|
single register because kernel changes between versions or service packs.
|
||
|
As soon as we discuss context, we will stay talking about general
|
||
|
situations. We choose to make a jmp [reg+XX] which is FF60XX in hex.
|
||
|
|
||
|
This technique effectiveness lies on keeping the MmNonPagedPoolFreeListHead
|
||
|
walkable. A double linked list, as LIST_ENTRY, is walkable if Flink is
|
||
|
correct. Therefore we can choose an address for Flink as 0xXXXX60FF and
|
||
|
Blink will point to the code address. The Intel x86 architecture using
|
||
|
little endian our address is quite easy to found, we must check opcode
|
||
|
offset and discard too close possibilities. Next figure illustrates a
|
||
|
poisoned entry.
|
||
|
|
||
|
MmNonPagedPoolFreeListHead[i]
|
||
|
/------> +--------------------+
|
||
|
| | Flink | ---\
|
||
|
| |--------------------| |
|
||
|
| <---- | Blink | |
|
||
|
| +--------------------+ |
|
||
|
| | ... | |
|
||
|
| +--------------------+ |
|
||
|
| /-------------------------------/
|
||
|
| |
|
||
|
| | Poisoned entry
|
||
|
| | +--------------------+
|
||
|
| | | PreviousSize : - |
|
||
|
| | +--------------------+
|
||
|
| | | PoolIndex : - |
|
||
|
| | +--------------------+
|
||
|
| | | PoolType: NonPaged |
|
||
|
| | +--------------------+
|
||
|
| | | BlockSize : i |
|
||
|
| | +--------------------+
|
||
|
| | | PoolTag : - |
|
||
|
| \---> +--------------------+
|
||
|
| | Flink : 0xYYXX60FF | <--\
|
||
|
| |--------------------| |
|
||
|
| X--- | Blink : 0x80YYYYYY | |
|
||
|
| +--------------------+ |
|
||
|
| |
|
||
|
| /-------------------------------/
|
||
|
| | Fake entry (0xYYXX60FF)
|
||
|
| | +--------------------+
|
||
|
| | | PreviousSize : - |
|
||
|
| | +--------------------+
|
||
|
| | | PoolIndex : - |
|
||
|
| | +--------------------+
|
||
|
| | | PoolType: NonPaged |
|
||
|
| | +--------------------+
|
||
|
| | | BlockSize : < i |
|
||
|
| | +--------------------+
|
||
|
| | | PoolTag : - |
|
||
|
| |---> +--------------------+
|
||
|
| | | Flink : 0x80..... | ---\
|
||
|
| | |--------------------| |
|
||
|
| \---- | Blink : Poisoned | |
|
||
|
| +--------------------+ |
|
||
|
\--------------- [...] ------------/
|
||
|
|
||
|
Unlinking instruction : mov [0x80YYYYYY], 0xYYXX60FF
|
||
|
New Opcode after unlinking : jmp [reg+XX] (FF 60 XX)
|
||
|
|
||
|
[ Figure 3 - Poisoned double linked list ]
|
||
|
|
||
|
This figure shows a MmNonPagedPoolFreeListHead entry layout that assures
|
||
|
predicted unlinking and then code execution. We must maintain this layout
|
||
|
or we will lose our position. NonPaged blocks come from two different
|
||
|
virtual memory ranges. The second memory region start is stored in
|
||
|
MmNonPagedPoolExpansionStart. A cleaning function is called sometime to
|
||
|
free blocks from the expansion NonPaged pool. To avoid this cleaning, we
|
||
|
can use a Paged pool block locked. You can lock a memory block with the
|
||
|
MmProbeAndLockPages function. This lock makes described memory region as
|
||
|
resident. Another more discreet way is to remap a NonPaged block with the
|
||
|
function MmMapLockedPagesSpecifyCache. It is more discrete because this
|
||
|
mapping would be just before expansion NonPaged pool memory range. Using
|
||
|
a locked Paged pool block creates an address totally differently. A quick
|
||
|
look at those addresses between NonPaged ones show a clear difference.
|
||
|
As virtual memory is very large, it does not take too much time to find
|
||
|
an address like 0xYYXX60FF. We will not unlock those pages until our
|
||
|
technique is running.
|
||
|
|
||
|
To defeat code path issues we differentiate two different states. The first
|
||
|
state is when our block is selected. The second state is when our block is
|
||
|
unlinked. If we were able to return to the first step with our next fake
|
||
|
entry selected, we could continue walking code as normal. We achieve that
|
||
|
by using a generic approach. At IRQL equal to DISPATCH_LEVEL, we corrupt a
|
||
|
MmNonPagedPoolFreeListHead entry with some invalid pointers. With a hook on
|
||
|
the page fault handler we are capable to see first and second stages,
|
||
|
restore the right context each time and save context difference between
|
||
|
those states.
|
||
|
|
||
|
Assembly dump from MiAllocatePoolPages :
|
||
|
|
||
|
lea eax, [esi+8] ; Stage #1 esi is selected block and esi+8 its size
|
||
|
cmp [eax], ebx ; Check with needed size
|
||
|
mov ecx, esi
|
||
|
jnb loc_47014B
|
||
|
[...]
|
||
|
|
||
|
loc_47014B:
|
||
|
sub [esi+8], ebx
|
||
|
mov eax, [esi+8]
|
||
|
shl eax, 0Ch
|
||
|
add eax, esi
|
||
|
cmp _MmProtectFreedNonPagedPool, 0 ; Protected mode, don't care
|
||
|
mov [ebp+arg_4], eax
|
||
|
jnz short loc_47016E
|
||
|
mov eax, [esi] ; \ Stage #2
|
||
|
mov ecx, [esi+4] ; | Unlinking
|
||
|
mov [ecx], eax ; | procedure
|
||
|
mov [eax+4], ecx ; /
|
||
|
jmp short loc_470174
|
||
|
|
||
|
Now let's see how it works during our test technique with interrupt fault
|
||
|
handler (int 0xE) hooked :
|
||
|
|
||
|
lea eax, [esi+8]
|
||
|
; Stage #1 - Check with needed size
|
||
|
cmp [eax], ebx ; ----> PAGE FAULT esi = 0xAAAAAAAA | eax = esi + 8
|
||
|
; - We keep EIP and all registers
|
||
|
; - Scan all registers for 0xAAAAAAAA +/- 8
|
||
|
; and correct the current context. Continue.
|
||
|
mov ecx, esi
|
||
|
jnb loc_47014B
|
||
|
[...]
|
||
|
|
||
|
loc_47014B:
|
||
|
sub [esi+8], ebx
|
||
|
mov eax, [esi+8]
|
||
|
shl eax, 0Ch
|
||
|
add eax, esi
|
||
|
cmp _MmProtectFreedNonPagedPool, 0 ; Protected mode, don't care
|
||
|
mov [ebp+arg_4], eax
|
||
|
jnz short loc_47016E
|
||
|
mov eax, [esi] ; \ Stage #2 - Unlinking procedure
|
||
|
mov ecx, [esi+4] ; |
|
||
|
mov [ecx], eax ; | ------> PAGE FAULT ecx = 0xBBBBBBBB
|
||
|
; | eax = 0xCCCCCCCC
|
||
|
; | - Keep EIP and sub this context from
|
||
|
; | Stage #1 saved context
|
||
|
; | - Change fault registers and
|
||
|
; | structure pointers. Continue.
|
||
|
mov [eax+4], ecx ; /
|
||
|
jmp short loc_470174
|
||
|
|
||
|
Fault addresses 0xAAAAAAA, 0xBBBBBBBB and 0xCCCCCCCC must point on invalid
|
||
|
addresses to force a caught page fault. This test is made only once and
|
||
|
when we still have exclusivity on all processors. The int 0xE (page fault)
|
||
|
handler is restored just after.
|
||
|
|
||
|
This generic technique permits us to restore a valid context just before
|
||
|
selected block size is checked. Once we get code execution, we apply
|
||
|
context difference, change the current block register and then return at
|
||
|
first stage address. It works well because our two stages are very close,
|
||
|
once a selected block size is checked, unlinking is directly made.
|
||
|
|
||
|
Given examples were based on a single LIST_ENTRY of the
|
||
|
MmNonPagedPoolFreeListHead table but you must poison all entries. If a
|
||
|
given entry is empty (except for our fake blocks), the algorithm tries
|
||
|
next the entry. It means we will be called more than one time per
|
||
|
allocation. We created a mechanism to manage multiple call on a single
|
||
|
allocation. If the first entry is empty, the second entry is used and so
|
||
|
on. Then we will be called twice or more. By checking current table, we
|
||
|
can predict a future code execution on the same allocation and avoid
|
||
|
executing payload more than one time per allocation request.
|
||
|
|
||
|
Prefetch is a processor feature that retrieves more than a single
|
||
|
instruction from memory before it executes them. Some processor use a
|
||
|
complex branch prediction algorithm to fetch as much instruction as
|
||
|
possible. After some tests, we saw that processors invalidate code cache
|
||
|
when a modification occurs in cached memory addresses. Our driver supports
|
||
|
a case where code modification could be right after current instruction.
|
||
|
To achieve that we created a routine which calculates prefetch cache size
|
||
|
and consider it in next parts of our technique. We could also search
|
||
|
specifics instructions which clean prefetch cache like a far jump but it
|
||
|
can only be used as an option.
|
||
|
|
||
|
This technique gives us code execution for NonPaged allocation superior or
|
||
|
equal to 1 page. It achieves that with a stealth hook, created by kernel
|
||
|
code and cleaned by our routine directly after. It's far from being
|
||
|
perfect as those allocations are not used that much. Next part describes
|
||
|
how this technique can be extended to gain control over all NonPaged pool
|
||
|
allocations.
|
||
|
|
||
|
-----[ 3.2.2 - Expend it for every size
|
||
|
|
||
|
Others lists can not be hijacked the same way because synchronization
|
||
|
mechanisms are not exclusive. Changing some assembly code becomes tricky if
|
||
|
it can be executed by more than one thread at a time. Our method is
|
||
|
assuring our previous technique execution on any allocation. Once we have
|
||
|
control, we can find a way to retore ExAllocatePoolWithTag context with a
|
||
|
correct return value. We must do that without recoding a single line of
|
||
|
memory allocator. It is possible to create our own allocator but Windows
|
||
|
one is great and it will perfectly do the job for us.
|
||
|
|
||
|
During allocation, the lookaside list is checked first. It will pop an
|
||
|
entry and if this entry is not NULL, use it. This entry comes from
|
||
|
GENERAL_LOOKASIDE ListHeader field. This field structure is SLIST_HEADER.
|
||
|
|
||
|
kd> dt nt!_SLIST_HEADER .
|
||
|
+0x000 Alignment : Uint8B
|
||
|
+0x000 Next :
|
||
|
+0x000 Next : Ptr32 _SINGLE_LIST_ENTRY
|
||
|
+0x004 Depth : Uint2B
|
||
|
+0x006 Sequence : Uint2B
|
||
|
|
||
|
The ExInterlockedPopEntrySList function pops an entry from a SLIST_HEADER
|
||
|
structure. The Next field is a pointer to the next SLIST node (single
|
||
|
linked list). The Depth field represents how many entries are kept in the
|
||
|
list. ExFreePoolWithTag compare GENERAL_LOOKASIDE optimal depth with
|
||
|
current SLIST_HEADER depth. ExAllocatePoolWithTag does not check this
|
||
|
field and just looks if some entry can be popped out Next field. To stunt
|
||
|
allocation and free procedure on NonPaged lookaside table, we set Next
|
||
|
field to NULL and Depth field to 0xFFFF. This state will be preserved and
|
||
|
this table will not be used anymore.
|
||
|
|
||
|
Our technique expansion relies entirely on subverting how the
|
||
|
ExpNonPagedPoolDescriptor table is used. In the previous part, we explained
|
||
|
global variable ExpNumberOfNonPagedPools involvement in this process. It
|
||
|
is possible to expand number of NonPaged pools and then play with current
|
||
|
KNODE color. During allocation, the KNODE color defines which pool
|
||
|
descriptor is used. Then during free procedure, PoolIndex field of
|
||
|
POOL_HEADER keep pool descriptor color.
|
||
|
|
||
|
So we can use this nice feature to our advantage. Default KNODE color on
|
||
|
every processors would point on an empty pool descriptors. It will lead to
|
||
|
code execution using our base technique. If the function
|
||
|
MiAllocatePoolPages return address is not the one use for classical page
|
||
|
rounded allocation, we know that a smaller allocation occur. All we have
|
||
|
to do is switch PRCB KNODE pointer to a copy with custom color and recall
|
||
|
ExAllocatePoolWithTag. Everything related to allocation and block
|
||
|
management will be implemented as it needs to be even if it differs
|
||
|
between operating system versions. Returned blocks PoolIndex will point to
|
||
|
our own pool descriptor and free procedure, which will perfectly work. Lets
|
||
|
see how it will look on a single processor.
|
||
|
|
||
|
ExpNonPagedPoolDescriptor
|
||
|
+-------------------+
|
||
|
| PREVIOUS POOLDESC | <--- Kept for compatibility (0)
|
||
|
| EMPTY POOLDESC | <--- Default KNODE->color (1)
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| -- |
|
||
|
| CUSTOM POOLDESC | <--- Used for our allocations (16)
|
||
|
+-------------------+
|
||
|
|
||
|
[ Figure 4 - Corrupted ExpNonPagedPoolDescriptor ]
|
||
|
[ on single processor ]
|
||
|
|
||
|
This setup is just an example and you can manage the arrangement as you
|
||
|
want. We could transfer previous blocks from older pool descriptors in our
|
||
|
own and then receive free blocks. It is also possible to use multiple pool
|
||
|
descriptors and so on. Beware of system pool descriptor recycling as it can
|
||
|
leads to strange behavior specially on multi-processor architecture.
|
||
|
|
||
|
Once we have our fresh allocated block, we must return at
|
||
|
ExAllocatePoolWithTag return address. MiAllocatePoolPages has been called
|
||
|
to retrieve a new page and fill the current pool descriptor with it. It's
|
||
|
obvious that we can't return normally and let page allocation occurs. On
|
||
|
Intel x86 architecture the stack is used to store local variables,
|
||
|
arguments and saved registers. The Windows compiler starts by reserving
|
||
|
local variable and then pushes each register before its modification.
|
||
|
The next figure shows our stack configuration once we have code execution.
|
||
|
|
||
|
top
|
||
|
+--------------------+
|
||
|
| Our stack elements | Restore assembly example
|
||
|
+--------------------+ <------ /---------------\
|
||
|
| | | pop ecx |
|
||
|
| Saved registers | | pop ebx |
|
||
|
| | | pop esi |
|
||
|
+--------------------+ | leave |
|
||
|
| | | retn 0Ch |
|
||
|
| | \---------------/
|
||
|
| | |
|
||
|
| | |
|
||
|
| Stack variables | |
|
||
|
| | |
|
||
|
| | |
|
||
|
| | |
|
||
|
+--------------------+ [new stack level]
|
||
|
| Saved EBP | |
|
||
|
+--------------------+ |
|
||
|
| Return Address | |
|
||
|
+--------------------+ |
|
||
|
| | |
|
||
|
| Function arguments | |
|
||
|
| | |
|
||
|
+--------------------+ <--------------/
|
||
|
bottom
|
||
|
|
||
|
[ Figure 5 - Stack context after code execution ]
|
||
|
[ ~ small blocks case ~ ]
|
||
|
|
||
|
The restore assembly part shows correct assembly in current function which
|
||
|
perfectly restores the context. It does not correspond of the first series
|
||
|
of pop instruction before return. There is an important risk that some
|
||
|
register has not been pushed yet. It is possible to deduce the pushed
|
||
|
register number by looking at function prologue when stack variables are
|
||
|
reserved. In the Windows compiler, it's quite simple and we can easily
|
||
|
calculate the pushed register number. A simple disassembly analysis on
|
||
|
needed pop register number does the job. It must be done for
|
||
|
MiAllocatePoolPages and ExAllocatePoolWithTag. We change the return
|
||
|
address stored in the stack and go to the deduced MiAllocatePoolPages
|
||
|
address. Last step is setting eax register for the return value. Both
|
||
|
functions return a value and preserve eax value. Our analyzer is dynamic
|
||
|
and registers each pop and its register. That why we can restore the
|
||
|
proper context even if it changes between versions.
|
||
|
|
||
|
The Windows compiler is really easy to predict and does not create too
|
||
|
strange assembly organization. This technique is theoretically possible
|
||
|
on every assembly code that follow stdcall specification. The approach
|
||
|
could differ on others compilers.
|
||
|
|
||
|
---[ 3.3 Exploit our position
|
||
|
|
||
|
This article present a way of subverting the Windows kernel by modifying
|
||
|
only data. No function pointers, no static hooking or others classical
|
||
|
technique. It could exempt us of any other explanation. But it would not
|
||
|
be complete without some concrete examples. I personally believe that the
|
||
|
only limitation here is imagination.
|
||
|
|
||
|
-----[ 3.3.1 Generic stack redirection
|
||
|
|
||
|
Allocation occurs in so many places that you must rely on known context
|
||
|
and functions. Once everything is setup and before releasing exclusivity,
|
||
|
some stack redirection database can be created.
|
||
|
|
||
|
The first way to do this is calling a handler if stack backtracing
|
||
|
reveals a specific function. Stack backtracing shows only return addresses
|
||
|
and not which function call it. Debuggers resolve those functions by deep
|
||
|
analysis or symbol checking. Implementing those features would take too
|
||
|
much time. So it's better to target a specific return address on
|
||
|
ExAllocatePoolWithTag stack frame. It will definitely improve check speed.
|
||
|
To do that, we indicates to our stack redirection API that we target a
|
||
|
specific function. Then launch a normal call or procedure that will lead to
|
||
|
our function. Every allocation during this time will show important
|
||
|
backtrace stacks.
|
||
|
|
||
|
Let say, we target an IRP and we know which function handles it by looking
|
||
|
at IRP dispatch table. We also know by reversing that it will allocate a
|
||
|
NonPaged block. Launching an I/O request, our API could register some
|
||
|
NonPaged call and recognize later.
|
||
|
|
||
|
In the wild, it will call the appropriate handler with sub context
|
||
|
information. Sometimes getting a context is not enough. The second way
|
||
|
stays on same principles but modifies the stack to assure our handler is
|
||
|
called once the function end. Efficiency depends on what is your target
|
||
|
and how you modifying it.
|
||
|
|
||
|
-----[ 3.3.2 Userland process code injection
|
||
|
|
||
|
This technique can be also used to inject code in userland to subvert
|
||
|
trusted applications. NonPaged allocation occurs a lot in kernel mode and
|
||
|
it happens in every process. Some kernel drivers like win32k.sys call
|
||
|
userland many times. This call is achieve by the function
|
||
|
KeUserModeCallback [35]. It modifies userland stack to switch temporarily
|
||
|
for a call in userland. Available functions are limited by a table.
|
||
|
|
||
|
Userland injection from kernel should not be resident and only concern
|
||
|
known trusted application as browsers. Injection can be done on
|
||
|
explorer.exe as well to launch an hidden instance of a trusted program.
|
||
|
KeUserModeCallback algorithm can be easily remade or copied then
|
||
|
relocated.Redirection table could be subverted to redirect the call. We
|
||
|
can also think about exploiting userland calls. It does not make any sense
|
||
|
to add checks on those available functions.
|
||
|
|
||
|
--[ 4 - Detection
|
||
|
|
||
|
This article does not try to convince you that subverting IDT or
|
||
|
allocation mechanism using advanced technique is the future. Most detection
|
||
|
tools only indicate if a rookit may or may not be in this computer. It has
|
||
|
pains identifying which module is responsible. It detects antivirus or
|
||
|
firewall as rootkits. A protection layout could detect itself as a rootkit
|
||
|
because it does everything a rootkit does and so does not ask it to block
|
||
|
or uninstall a rootkit. Rootkit papers demonstrate so many great ways to
|
||
|
easily bypass those protections. But we don't see much those techniques in
|
||
|
the wild, simply because rootkits don't need them for the moment.
|
||
|
|
||
|
Detect software behavior modification could be part of a Verifiable
|
||
|
Operating System [36]. It will involve basic checks on known memory
|
||
|
structures. Checks integrity of LIST_ENTRY structures and correct them if
|
||
|
needed. We can blame rootkit protections as much as we want but detecting
|
||
|
rootkits on a closed operating system is almost impossible. Gives more
|
||
|
information for kernel components will certainly leads to more
|
||
|
sofisticates attacks. In the other hand, it could reduce attack surface.
|
||
|
It is specially true on a defence oriented operating system. Next
|
||
|
protection improvements should come from the operating system itself.
|
||
|
|
||
|
Now that there are hardware improvements for virtualisation, such as
|
||
|
hypervisors, there will be extensions to hardware to detect and protect
|
||
|
against rootkits. It offers a real control on operating system behavior
|
||
|
without advanced research on kernel layout. Some protections techniques
|
||
|
that were impossible to implement in Windows environment like PAX, could
|
||
|
rely on those hardware features. Our techniques could be detected by
|
||
|
registering and monitoring some specific events on the processor. It is
|
||
|
possible today to do that but performance issues are important.
|
||
|
|
||
|
Our attacks could be blocked using targeted protection such as signatures.
|
||
|
An attack is defined as how many times it takes to create a generic
|
||
|
protection. In this area, Patchguard is an important improvement.
|
||
|
|
||
|
--[ 5 - Conclusion
|
||
|
|
||
|
This papers techniques were made to show that elegant software hijacking
|
||
|
can still evades most protections and avoid any performance issues or
|
||
|
unstable behaviors. Even though, these techniques are hardly reliable and
|
||
|
should be considered only as a technical proof of concept. New protections
|
||
|
are not efficient enough or present. They do not represent a threat for
|
||
|
a rootkit which targets millions of computers. Reversing is an important
|
||
|
tool in improving software rootkits techniques. Detecting that a rootkit
|
||
|
is present should not be enough. A protection which cannot uninstall a
|
||
|
rootkit or prevent infection is useless. Drivers signatures was a good
|
||
|
idea as it was designed to stop current infections entries. But infection
|
||
|
prevention includes local kernel exploitation. Generic detection of those
|
||
|
attacks would need an important improvement in anti-rootkits protections
|
||
|
and operating system design.
|
||
|
|
||
|
--[ 6 - References
|
||
|
|
||
|
[1] Holy Father, Invisibility on NT boxes, How to become unseen on Windows
|
||
|
NT (Version: 1.2)
|
||
|
http://vx.netlux.org/lib/vhf00.html
|
||
|
|
||
|
[2] Holy Father, Hacker Defender
|
||
|
https://www.rootkit.com/vault/hf/hxdef100r.zip
|
||
|
|
||
|
[3] 29A
|
||
|
http://vx.netlux.org/29a
|
||
|
|
||
|
[4] Greg Hoglund, NT Rootkit
|
||
|
https://www.rootkit.com/vault/hoglund/rk_044.zip
|
||
|
|
||
|
[5] fuzen_op, FU
|
||
|
http://www.rootkit.com/project.php?id=12
|
||
|
|
||
|
[6] Peter Silberman, C.H.A.O.S, FUto
|
||
|
http://uninformed.org/?v=3&a=7
|
||
|
|
||
|
[7] Eeye, Bootroot
|
||
|
http://research.eeye.com/html/tools/RT20060801-7.html
|
||
|
|
||
|
[8] Eeye, Pixie
|
||
|
http://research.eeye.com/html/papers/download/
|
||
|
eEyeDigitalSecurity_Pixie%20Presentation.pdf
|
||
|
|
||
|
[9] Joanna Rutkowska and Alexander Tereshkin, Blue Pill project
|
||
|
http://bluepillproject.org/
|
||
|
|
||
|
[10] Frank Boldewin, A Journey to the Center of the Rustock.B Rootkit
|
||
|
http://www.reconstructer.org/papers/
|
||
|
A%20Journey%20to%20the%20Center%20of%20the%20Rustock.B%20Rootkit.zip
|
||
|
|
||
|
[11] Frank Boldewin, Peacomm.C - Cracking the nutshell
|
||
|
http://www.reconstructer.org/papers/
|
||
|
Peacomm.C%20-%20Cracking%20the%20nutshell.zip
|
||
|
|
||
|
[12] Stealth MBR rootkit
|
||
|
http://www2.gmer.net/mbr/
|
||
|
|
||
|
[13] EP_X0FF and MP_ART, Unreal.A, bypassing modern Antirootkits
|
||
|
http://www.rootkit.com/newsread.php?newsid=647
|
||
|
|
||
|
[14] AK922 : Bypassing Disk Low Level Scanning to Hide File
|
||
|
http://rootkit.com/newsread.php?newsid=783
|
||
|
|
||
|
[15] CardMagic and wowocock, DarkSpy
|
||
|
http://www.fyyre.net/~cardmagic/index_en.html
|
||
|
|
||
|
[16] pjf, IceSword
|
||
|
http://pjf.blogone.net
|
||
|
|
||
|
[17] Gmer
|
||
|
http://www.gmer.net/index.php
|
||
|
|
||
|
[18] Pageguard papers (Uniformed) :
|
||
|
|
||
|
- Bypassing PatchGuard on Windows x64 by skape & Skywing
|
||
|
http://www.uninformed.org/?v=all&a=14&t=sumry
|
||
|
|
||
|
- Subverting PatchGuard Version 2 by Skywing
|
||
|
http://www.uninformed.org/?v=all&a=28&t=sumry
|
||
|
|
||
|
- PatchGuard Reloaded: A Brief Analysis of PatchGuard Version 3 by Skywing
|
||
|
http://www.uninformed.org/?v=all&a=38&t=sumry
|
||
|
|
||
|
[19] Greg Hoglund, Kernel Object Hooking Rootkits (KOH Rootkits)
|
||
|
http://www.rootkit.com/newsread.php?newsid=501
|
||
|
|
||
|
[20] Windows Heap Overflows - David Litchfield
|
||
|
http://www.blackhat.com/presentations/win-usa-04/bh-win-04-litchfield/
|
||
|
bh-win-04-litchfield.ppt
|
||
|
|
||
|
[21] Bypassing Klister 0.4 With No Hooks or Running a Controlled
|
||
|
Thread Scheduler by 90210 - 29A
|
||
|
http://vx.netlux.org/29a/magazines/29a-8.rar
|
||
|
|
||
|
[22] Microsoft, Debugging Tools for Windows
|
||
|
http://www.microsoft.com/whdc/devtools/debugging/default.mspx
|
||
|
|
||
|
[23] Kad, Phrack 59, Handling Interrupt Descriptor Table for fun and profit
|
||
|
http://phrack.org/issues.html?issue=59&id=4#article
|
||
|
|
||
|
[24] Wikipedia, Southbridge
|
||
|
http://en.wikipedia.org/wiki/Southbridge_(computing)
|
||
|
|
||
|
[25] Wikipedia, Northbridge
|
||
|
http://en.wikipedia.org/wiki/Northbridge_%28computing%29
|
||
|
|
||
|
[26] The NT Insider, Stop Interrupting Me -- Of PICs and APICs
|
||
|
http://www.osronline.com/article.cfm?article=211 (login required)
|
||
|
|
||
|
[27] Russinovich, Solomon, Microsoft Windows Internals, Fourth Edition
|
||
|
Chapter 3. System Mechanisms -> Trap Dispatching
|
||
|
|
||
|
[28] MSDN, KeyboardClassServiceCallback
|
||
|
http://msdn2.microsoft.com/en-us/library/ms793303.aspx
|
||
|
|
||
|
[29] Clandestiny, Klog
|
||
|
http://www.rootkit.com/vault/Clandestiny/Klog%201.0.zip
|
||
|
|
||
|
[30] Alexander Tereshkin, Rootkits: Attacking Personal Firewalls
|
||
|
www.blackhat.com/presentations/bh-usa-06/BH-US-06-Tereshkin.pdf
|
||
|
|
||
|
[31] MSDN, NdisMIndicateReceivePacket
|
||
|
http://msdn2.microsoft.com/en-us/library/aa448038.aspx
|
||
|
|
||
|
[32] Subverting VistaTM Kernel For Fun And Profit by Joanna Rutkowska
|
||
|
http://invisiblethings.org/papers/
|
||
|
joanna%20rutkowska%20-%20subverting%20vista%20kernel.ppt
|
||
|
|
||
|
[33] Vista RC2 vs. pagefile attack by Joanna Rutkowska
|
||
|
http://theinvisiblethings.blogspot.com/2006/10/
|
||
|
vista-rc2-vs-pagefile-attack-and-some.html
|
||
|
|
||
|
[34] Russinovich, Solomon, Microsoft Windows Internals, Fourth Edition
|
||
|
Chapter 7. Memory Management -> System Memory Pools
|
||
|
|
||
|
[35] KeUserModCallback ref - "Ring0 under WinNT/2k/XP" by Ratter - 29A
|
||
|
http://www.illmob.org/files/text/29a7/Articles/29A-7.003
|
||
|
|
||
|
[36] Joanna Rutkowska - Towards Verifiable Operating Systems
|
||
|
http://theinvisiblethings.blogspot.com/2007/01/
|
||
|
towards-verifiable-operating-systems.htm
|