mirror of https://github.com/fdiskyou/Zines.git
381 lines
18 KiB
Plaintext
381 lines
18 KiB
Plaintext
Loop Detection
|
|
Peter Silberman
|
|
peter.silberman@gmail.com
|
|
|
|
1) Foreword
|
|
|
|
Abstract: During the course of this paper the reader will gain new knowledge
|
|
about previous and new research on the subject of loop detection. The topic of
|
|
loop detection will be applied to the field of binary analysis and a case study
|
|
will given to illustrate its uses. All of the implementations provided in this
|
|
document have been written in C/C++ using Interactive Disassembler (IDA)
|
|
plug-ins.
|
|
|
|
Thanks: The author would like to thank Pedram Amini, thief, Halvar Flake,
|
|
skape, trew, Johnny Cache and everyone else at nologin who help with ideas, and
|
|
kept those creative juices flowing.
|
|
|
|
|
|
2) Introduction
|
|
|
|
The goal of this paper is to educate the reader both about why loop detection
|
|
is important and how it can be used. When a security researcher thinks of
|
|
insecure coding practices, things like calls to strcpy and sprintf are some of
|
|
the first things to come to mind. These function calls are considered low
|
|
hanging fruit. Some security researchers think of integer overflows or
|
|
off-by-one copy errors as types of vulnerabilities. However, not many people
|
|
consider, or think to consider, the mis-usage of loops as a security problem.
|
|
With that said, loops have been around since the beginning of time (e.g. first
|
|
coding languages). The need for a language to iterate over data to analyze
|
|
each object or character has always been there. Still, not everyone thinks to
|
|
look at a loop for security problems. What if a loop doesn't terminate
|
|
correctly? Depending on the operation the loop is performing, it's possible
|
|
that it could corrupt surrounding memory regions if not properly managed. If
|
|
the loop frees memory that no longer exists or is not memory, a double-free bug
|
|
could've been found. These are all things that could, and do, happen in a
|
|
loop.
|
|
|
|
As the low hanging fruit is eliminated in software by security researchers and
|
|
companies doing decent to moderate QA testing, the security researchers have to
|
|
look elsewhere to find vulnerabilities in software. One area that has only
|
|
been touched on briefly in the public relm, is how loops operate when
|
|
translated to binaries BugScan is an example of a company that has implemented
|
|
"buffer iteration" detection but hasn't talked publically about it.
|
|
http://www.logiclibrary.com. The reader may ask: why would one want to look at
|
|
loops? Well, a lot of companies implement their own custom string routines,
|
|
like strcpy and strcat, which tend to be just as dangerous as the standard
|
|
string routines. These functions tend to go un-analyzed because there is no
|
|
quick way to say that they are copying a buffer. Due to this reason, loop
|
|
detection can help the security research identify areas of interest. During
|
|
the course of this article the reader will learn of the different ways to
|
|
detect loops using graph analysis, how to implement loop detection, see a new
|
|
loop detection IDA plug-in, and a case study that will tie it all together.
|
|
|
|
|
|
3) Algorithms Used to Detect Loops
|
|
|
|
A lot of research has been done on the subject of loop detection. The
|
|
research, however, was not done for the purpose of finding and exploiting
|
|
vulnerabilities that exist inside of loops. Most research has been done with
|
|
an interest in recognizing and optimizing loops A good article about loop
|
|
optimization and compiler optimization is
|
|
http://www.cs.princeton.edu/courses/archive/spring03/cs320/notes/loops.pdf .
|
|
Research on the optimization of loops has led scientists to classify various
|
|
types of loops. There are two distinct categories to which any loop will
|
|
belong. Either the loop will be an irreducible loop Irreducible loops are
|
|
defined as "loops with multiple entry [points]"
|
|
(http://portal.acm.org/citation.cfm?id=236114.236115) or a reducible loop
|
|
Reducible loops are defined as "loops with one entry [point]"
|
|
(http://portal.acm.org/citation.cfm?id=236114.236115). Given that there are
|
|
two different distinct categories, it stands to reason that the two types of
|
|
loops are detected in different fashions. Two popular papers on loop detection
|
|
are Interval Finding Algorithm and Identifying Loops Using DJ Graphs. This
|
|
document will cover the most widely accepted theory on loop detection.
|
|
|
|
|
|
3.1) Natural Loop Detection
|
|
|
|
One of the most well known algorithms for loop detection is demonstrated in the
|
|
book Compilers Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi
|
|
and Jeffrey D. Ullman. In this algorithm, the authors use a technique that
|
|
consists of two components to find natural loops A natural loop "Has a single
|
|
entry point. The header dominates all nodes in the loop."
|
|
(http://www-2.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15745-s03/public/lectures/L7_handouts.pdf
|
|
all loops are not natural loops.
|
|
|
|
The first component of natural loop detection is to build a dominator tree out
|
|
of the control flow graph (CFG). A dominator can be found when all paths to a
|
|
given node have to go through another node. A control flow graph is essentially
|
|
a map of code execution with directional information. The algorithm in the
|
|
book calls for the finding of all the dominators in a CFG. Let's look at the
|
|
actual algorithm.
|
|
|
|
Starting from the entry node, the algorithm needs to check if there is a path
|
|
to the slave from the entry node. This path has to avoid the master node. If
|
|
it is possible to get to the slave node without touching the master node, it
|
|
can be determined that the master node does not dominate the slave node. If it
|
|
is not possible to get to the slave node, it is determined that the master node
|
|
does dominate the slave. To implement this routine the user would call the
|
|
is_path_to(ea_t from, ea_t to, ea_t avoid) function included in loopdetection.cpp.
|
|
This function will essentially check to see if there is a path from the
|
|
parameter from that can get to the parameter to, and will avoid the node
|
|
specified in avoid. Figure illustrates this algorithm.
|
|
|
|
As the reader can see from Figure 1, there is a loop in this CFG. Let B to C
|
|
to D be the path of nodes that create a loop, it will be represented as
|
|
B->C->D. There is also another loop from nodes B->D. Using the algorithm
|
|
described above it is possible to verify which of these nodes is involved in
|
|
the natural loop. The first question to ask is if the flow of the program can
|
|
get from A to D while avoiding B. As the reader can see, it is impossible in
|
|
this case to get to D avoiding B. As such, a call to the is_path_to function
|
|
will tell the user that B Dominates D. This can be represented as B Dom D, and
|
|
B Dom C. This is due to the fact that there is no way to reach C or D without
|
|
going through B. One question that might be asked is how exactly does this
|
|
demonstrate a loop? The answer is that, in fact, it doesn't. The second
|
|
component of the natural loop detection checks to see if there is a link, or
|
|
backedge, from D to B that would allow the flow of the program to return to
|
|
node B to complete the loop. In the case of B->D there exists a backedge that
|
|
does complete the loop.
|
|
|
|
|
|
3.2) Problems with Natural Loop Detection
|
|
|
|
There is a very big problem with natural loops. The problem is with the
|
|
natural loop definition which is ``a single entry point whose header dominates
|
|
all the nodes in the loop''. Natural loop detection does not deal with
|
|
irreducible loops, as defined previously. This problem can be demonstrated in
|
|
figure
|
|
|
|
As the reader can see both B and D are entry points into C. Also neither D nor
|
|
B dominates C. This throws a huge wrench into the algorithm and makes it only
|
|
able to pick up loops that fall under the specification of a natural loop or
|
|
reducible loop It is important to note that it is next that it is next to
|
|
impossible to reproduce
|
|
|
|
|
|
4) A Different Approach to Loop Detection
|
|
|
|
The reader has seen how to detect dominators within a CFG and how to use that
|
|
as a component to find natural loops. The previous chapter described why
|
|
natural loop detection was flawed when trying to detect irreducible loops. For
|
|
binary auditing, the tool will need to be able to pick up all loops and then
|
|
let the user deduce whether or not the loops are interesting. This chapter
|
|
will introduce the loop algorithm used in the IDA plug-in to detect loops.
|
|
|
|
To come up with an algorithm that was robust enough to detect both loops in the
|
|
irreducible and reducible loop categories, the author decided to modify the
|
|
previous definition of a natural loop. The new definition reads "a loop can
|
|
have multiple entry points and at least one link that creates a cycle." This
|
|
definition avoids the use of dominators to detect loops in the CFG.
|
|
|
|
The way this alternative algorithm works is by first making a call to the
|
|
is_reference_to(ea_t to, ea_t ref) function. The function is_reference_to will
|
|
determine if there is a reference from the ea_t specified by ref to the
|
|
parameter to. This check within the loop detection algorithm determines if
|
|
there is a backedge or link that would complete a loop. The reason this check
|
|
is done first is for speed. If there is no reference that would complete a
|
|
loop then there is no reason to call is_path_to, thus preventing unnecessary
|
|
calculations. However, if there is a link or backedge, a call to the
|
|
overloaded function is_path_to(ea_t from, ea_t to) is used to determine if the
|
|
nodes that are being examined can even reach each other. The is_path_to function
|
|
simulates all possible code execution conditions by following all possible
|
|
edges to determine if the flow of execution could ever reach parameter to when
|
|
starting at parameter from. The function is_path_to(ea_t from, ea_t to) returns
|
|
one (true) if there is indeed a path going from from to to. With both of these
|
|
functions returning one, it can be deduced that these nodes are involved in the
|
|
loop.
|
|
|
|
|
|
4.1) Problems with new approach
|
|
|
|
In every algorithm there can exists small problems, that make the algorithm far
|
|
from optimal. This problem applies to the new approach presented above. The
|
|
algorithm presented above has not been optimized for performance. The algorithm
|
|
runs in a time of O(N2), which carries quite a load if there are more than 600
|
|
or so nodes.
|
|
|
|
The reason that the algorithm is so time consuming is that instead of
|
|
implementing a Breadth First Search (BFS), a Depth First Search (DFS) was
|
|
implemented, in the is_path_to function which computes all possible paths to and
|
|
from a given node. Depth First Search is much more expensive than Breadth First
|
|
Search, and because of that the algorithm may in some rare cases suffer. If
|
|
the reader is interested in how to implement a more efficient algorithm for
|
|
finding the dominators, the reader should check out Compiler Design
|
|
Implementation by Steven S. Muchnick.
|
|
|
|
|
|
It should be noted that in future of this plug-in there will be optimizations
|
|
made to the code. The optimizations will specifically deal new implementations
|
|
of a Breadth First Search instead of the Depth First Search, as well as other
|
|
small optimizations.
|
|
|
|
|
|
5) Loop Detection Using IDA Plug-ins
|
|
|
|
In every algorithm and theory there exists small problems. It is important to
|
|
understand the algorithm presented
|
|
|
|
The plug-in described in this document uses the Function Analyzer Class
|
|
(functionanalyzer) that was developed by Pedram Amini
|
|
(http://labs.idefense.com) as the base class. The Loop Detection
|
|
(loopdetection) class uses inheritance to glean its attributes from Function
|
|
Analyzer. The reason inheritance is used is primarily for ease of development.
|
|
Inheritance is also used so that instead of having to re-add functions to a new
|
|
version of Function Analyzer, the user only has to replace the old file. The
|
|
final reason inheritance is used is for code conformity, which is accomplished
|
|
by creating virtual functions. These virtual functions allow the user to
|
|
override methods that are implemented in the Function Analyzer. This means
|
|
that if a user understands the structure of function analyzer, they should not
|
|
have a hard time understanding loop detections structure.
|
|
|
|
|
|
5.1) Plug-in Usage
|
|
|
|
To best utilize this plug-in the user needs to understand its features and
|
|
capabilities. When a user runs the plug-in they will be prompted with a window
|
|
that is shown in figure . Each of the options shown in figure are described
|
|
individually.
|
|
|
|
|
|
Graph Loop
|
|
|
|
This feature will visualize the loops, marking the entry of a loop with green
|
|
border, the exit of a loop with a red border and a loop node with a yellow
|
|
border. Highlight Function Calls This option allows the user to highlight the
|
|
background of any function call made within the loop. The highlighting is done
|
|
within IDA View.
|
|
|
|
|
|
Output Stack Information
|
|
|
|
This is a feature that is only enabled with the graph loop option. When this
|
|
option is enabled the graph will contain information about the stack of the
|
|
function including the variables name, whether or not it is an argument, and
|
|
the size of the variable. This option is a great feature for static auditing.
|
|
|
|
|
|
Highlight Code
|
|
|
|
This option is very similar to Highlight Function except instead of just
|
|
highlighting function calls within loops it will highlight all the code that is
|
|
executed within the loops. This makes it easier to read the loops in IDA View
|
|
|
|
|
|
Verbose Output
|
|
|
|
This feature allows the user to see how the program is working and will give
|
|
more information about what the plug-in is doing.
|
|
|
|
|
|
Auto Commenting
|
|
|
|
This option adds comments to loops nodes, such as where the loop begins, where
|
|
it exits, and other useful information so that the user doesn't have to
|
|
continually look at the graph.
|
|
|
|
|
|
All Loops Highlighting of Functions
|
|
|
|
This feature will find every loop within the IDA database. It will then
|
|
highlight any call to any function within a loop. The highlighting is done
|
|
within the IDA View making navigation of code easier.
|
|
|
|
|
|
All Loops Highlighting of Code
|
|
|
|
This option will find every loop within the database. It will then highlight
|
|
all segments of code involved in a loop. The highlighting of code will allow
|
|
for easier navigation of code within the IDA View.
|
|
|
|
|
|
Natural Loops
|
|
|
|
This detection feature allows the user to only see natural loops. It may not
|
|
pick up all loops but is an educational implementation of the previously
|
|
discussed algorithm.
|
|
|
|
|
|
Recursive Function Calls
|
|
|
|
This detection option will allow the user to see where recursive function calls
|
|
are located.
|
|
|
|
|
|
5.2) Known Issues
|
|
|
|
There a couple of known issues with this plug-in. It does not deal with rep*
|
|
instructions, nor does it deal with mov** instructions that might result in
|
|
copied buffers. Future versions will deal with these instructions, but since
|
|
it is open-sourced the user can make changes as they see fit. Another issue is
|
|
that of ``no-interest''. By this the author means detecting loops that aren't
|
|
of interest or don't pose a security risk. These loops, for example, may be
|
|
just counting loops that don't write memory. Halvar Flake describes this topic
|
|
in his talk that was given at Blackhat Windows 2004. Feel free to read his
|
|
paper and make changes accordingly. The author will also update the plug-in
|
|
with these options at a later date.
|
|
|
|
|
|
5.3) Case Study: Zone Alarm
|
|
|
|
For a case study the author chose Zone Alarm's vsdatant.sys driver. This
|
|
driver does a lot of the dirty work for Zone Alarm such as packet filtering,
|
|
application monitoring, and other kernel level duties. Some may wonder why it
|
|
would be worthwhile to find loops in a driver. In Zone Alarm's case, the user
|
|
can hope to find miscalculations in lengths where they didn't convert a signed
|
|
to unsigned value properly and therefore may cause an overflow when looping.
|
|
Anytime an application takes data in remotely that may be type-casted at some
|
|
point, there is always a great chance for loops that overflow their bounds.
|
|
|
|
When analyzing the Zone Alarm driver the user needs to select certain options
|
|
to get a better idea of what is going on with loops. First, the user should
|
|
select verbose output and All Loops Highlighting of Functions to see if there
|
|
are any dangerous function calls within the loop. This is illustrated in
|
|
figure .
|
|
|
|
After running through the loop detection phase, some interesting results are
|
|
found that are shown in figure .
|
|
|
|
Visiting the address 0x00011a21 in IDA shows the loop. To begin, the reader
|
|
will need to find the loop's entry point, which is at:
|
|
|
|
.text:00011A1E jz short loc_11A27
|
|
|
|
At the loop's entry point, the reader will notice:
|
|
|
|
.text:00011A27 push 206B6444h ; Tag
|
|
.text:00011A2C push edi ; NumberOfBytes
|
|
.text:00011A2D push 1 ; PoolType
|
|
.text:00011A2F call ebp ;ExAllocatePoolWithTag
|
|
|
|
At this point, the reader can see that every time the loop passes through its
|
|
entry point it will allocate memory. To determine if the attacker can cause a
|
|
double free error, further investigation is needed.
|
|
|
|
.text:00011A31 mov esi, eax
|
|
.text:00011A33 test esi, esi
|
|
.text:00011A35 jz short loc_11A8F
|
|
|
|
If the memory allocation within the loop fails, the loop terminates correctly.
|
|
The next call in the loop is to ZwQuerySystemInformation which tries to acquire
|
|
the SystemProcessAndThreadsInformation.
|
|
|
|
.text:00011A46 mov eax, [esp+14h+var_4]
|
|
.text:00011A4A add edi, edi
|
|
.text:00011A4C inc eax
|
|
.text:00011A4D cmp eax, 0Fh
|
|
.text:00011A50 mov [esp+14h+var_4], eax
|
|
.text:00011A54 jl short loc_11A1C
|
|
|
|
This part of the loop is quite un-interesting. In this segment the code
|
|
increments a counter in eax until eax is greater than 15. It is obvious that
|
|
it is not possible to cause a double free error in this case because the user
|
|
has no control over the loop condition or data within the loop. This ends the
|
|
investigation into a possible double free error.
|
|
|
|
Above is a good example of how to analyze loops that may be of interest. With
|
|
all binary analysis it is important to not only identify dangerous function
|
|
calls but to also identify if the attacker can control data that might be
|
|
manipulated or referenced within a loop.
|
|
|
|
|
|
6) Conclusion
|
|
|
|
During the course of this paper, the reader has had a chance to learn about the
|
|
different types of loops and some of the method of detecting them. The reader
|
|
has also gotten an in-depth view of the new IDA plug-in released with this
|
|
article. Hopefully now when the reader sees a loop, whether in code or binary,
|
|
the reader can explore the loop and determine if it is a security risk or not.
|
|
|
|
|
|
Bibliography
|
|
|
|
Tarjan, R. E. 1974. Testing flow graph reducibility. J
|
|
Comput. Syst. Sci. 9, 355-365.
|
|
|
|
Sreedhar, Vugranam, Guang Gao, Yong-Fong Lee. Identifying
|
|
loops using DJ graphs.
|
|
http://portal.acm.org/citation.cfm?id=236114.236115
|
|
|
|
Flake, Halvar. Automated Reverse Engineering.
|
|
http://www.blackhat.com/presentations/win-usa-04/bh-win-04-flake.pdf
|