mirror of
https://github.com/fdiskyou/Zines.git
synced 2025-03-09 00:00:00 +01:00
1459 lines
51 KiB
Text
1459 lines
51 KiB
Text
_ _
|
|
_/B\_ _/W\_
|
|
(* *) Phrack #64 file 10 (* *)
|
|
| - | | - |
|
|
| | Cryptanalysis of DPA-128 | |
|
|
| | | |
|
|
| | By SysK | |
|
|
| | | |
|
|
| | syskall@phreaker.net | |
|
|
(____________________________________________________)
|
|
|
|
|
|
--[ Contents
|
|
|
|
1 - Introduction
|
|
2 - A short word about block ciphers
|
|
3 - Overview of block cipher cryptanalysis
|
|
4 - Veins' DPA-128
|
|
4.1 - Bugs in the implementation
|
|
4.2 - Weaknesses in the design
|
|
5 - Breaking the linearized version
|
|
6 - On the non linearity of addition modulo n in GF(2)
|
|
7 - Exploiting weak keys
|
|
7.1 - Playing with a toy cipher
|
|
7.2 - Generalization and expected complexity
|
|
7.3 - Cardinality of |W
|
|
8 - Breaking DPA based unkeyed hash function
|
|
8.1 - Introduction to hash functions
|
|
8.2 - DPAsum() algorithm
|
|
8.3 - Weaknesses in the design/implementation
|
|
8.4 - A (2nd) preimage attack
|
|
9 - Conclusion
|
|
10 - Greetings
|
|
11 - Bibliography
|
|
|
|
|
|
--[ 1 - Introduction
|
|
|
|
While the cracking scene has grown with cryptology thanks to the evolution
|
|
of binary protection schemes, the hacking scene mostly hasn't. This fact
|
|
is greatly justified by the fact that there were globally no real need.
|
|
Indeed it's well known that if a hacker needs to decrypt some files then
|
|
he will hack into the box of its owner, backdoor the system and then use
|
|
it to steal the key. A cracker who needs to break a protection scheme will
|
|
not have the same approach: he will usually try to understand it fully in
|
|
order to find and exploit design and/or implementation flaws.
|
|
|
|
Although the growing of the security industry those last years changed a
|
|
little bit the situation regarding the hacking community, nowadays there
|
|
are still too many people with weak knowledge of this science. What is
|
|
disturbing is the broadcast of urban legends and other hoax by some
|
|
paranoids among them. For example, haven't you ever heard people claiming
|
|
that government agencies were able to break RSA or AES? A much more clever
|
|
question would have been: what does "break" mean?
|
|
|
|
A good example of paranoid reaction can be found in M1lt0n's article
|
|
[FakeP63]. The author who is probably skilled in hacking promotes the use
|
|
of "home made cryptographic algorithms" instead of standardized ones such
|
|
as 3DES. The corresponding argument is that since most so-called security
|
|
experts lake coding skills then they aren't able to develop appropriate
|
|
tools for exotic ciphers. While I agree at least partially with him
|
|
regarding the coding abilities, I can't possibly agree with the main
|
|
thesis. Indeed if some public tools are sufficient to break a 3DES based
|
|
protection then it means that a design and/or an implementation mistake
|
|
was/were made since, according to the state of the art, 3DES is still
|
|
unbroken. The cryptosystem was weak from the beginning and using "home
|
|
made cryptography" would only weaken it more.
|
|
|
|
It is therefore extremely important to understand cryptography and to
|
|
trust the standards. In a previous Phrack issue (Phrack 62), Veins exposed
|
|
to the hacking community a "home made" block cipher called DPA (Dynamic
|
|
Polyalphabetic Algorithms) [DPA128]. In the following paper, we are going
|
|
to analyze this cipher and demonstrate that it is not flawless - at least
|
|
from a cryptanalytic perspective - thus fitting perfectly with our talk.
|
|
|
|
|
|
--[ 2 - A short word about block ciphers
|
|
|
|
Let's quote a little bit the excellent HAC [MenVan]:
|
|
|
|
"A block cipher is a function which maps n-bit plaintext blocks to n-bit
|
|
ciphertext blocks; n is called the blocklength. It may be viewed as a
|
|
simple substitution cipher with large character size. The function is
|
|
parametrized by a k-bit key K, taking values from a subset |K (the key
|
|
space) of the set of all k-bit vectors Vk. It is generally assumed that
|
|
the key is chosen at random. Use of plaintext and ciphertext blocks of
|
|
equal size avoids data expansion."
|
|
|
|
Pretty clear isn't it? :> So what's the purpose of such a cryptosystem?
|
|
Obviously since we are dealing with encryption this class of algorithms
|
|
provides confidentiality. Its construction makes it particularly suitable
|
|
for applications such as large volumes encryption (files or HD for
|
|
example). Used in special modes such as CBC (like in OpenSSL) then it can
|
|
also provide stream encryption. For example, we use AES-CBC in the WPA2,
|
|
SSL and SSH protocols.
|
|
|
|
Remark: When used in conjunction with other mechanisms, block ciphers can
|
|
also provide services such as authentication or integrity (cf part 8 of
|
|
the paper).
|
|
|
|
An important point is the understanding of the cryptology utility. While
|
|
cryptography aims at designing best algorithms that is to say secure and
|
|
fast, cryptanalysis allows the evaluation of the security of those
|
|
algorithms. The more an algorithm is proved to have weaknesses, the less
|
|
we should trust it.
|
|
|
|
|
|
--[ 3 - Overview of block cipher cryptanalysis
|
|
|
|
The cryptanalysis of block ciphers evolved significantly in the 90s with
|
|
the apparition of some fundamental methods such as the differential
|
|
[BiSha90] and the linear [Matsui92] cryptanalysis. In addition to some
|
|
more recent ones like the boomerang attack of Wagner or the chi square
|
|
cryptanalysis of Vaudenay [Vaud], they constitute the set of so-called
|
|
statistical attacks on block ciphers in opposition to the very recent and
|
|
still controverted algebraic ones (see [CourtAlg] for more information).
|
|
|
|
Today the evolution of block cipher cryptanalysis tends to stabilize
|
|
itself. However a cryptographer still has to acquire quite a deep knowledge
|
|
of those attacks in order to design a cipher. Reading the Phrack paper, we
|
|
think - actually we may be wrong - that the author mostly based his design
|
|
on statistical tests. Although they are obviously necessary, they can't
|
|
possibly be enough. Every component has to be carefully chosen. We
|
|
identified several weaknesses and think that some more may still be left.
|
|
|
|
|
|
--[ 4 - Veins' DPA-128 description
|
|
|
|
DPA-128 is a 16 rounds block cipher providing 128 bits block encryption
|
|
using an n bits key. Each round encryption is composed of 3 functions
|
|
which are rbytechain(), rbitshift() and S_E(). Thus for each input block,
|
|
we apply the E() function 16 times (one per round) :
|
|
|
|
void E (unsigned char *key, unsigned char *block, unsigned int shift)
|
|
{
|
|
rbytechain (block);
|
|
rbitshift (block, shift);
|
|
S_E (key, block, shift);
|
|
}
|
|
|
|
where:
|
|
|
|
- block is the 128b input
|
|
- shift is a 32b parameter dependent of the round subkey
|
|
- key is the 128b round subkey
|
|
|
|
Consequently, the mathematical description of this cipher is:
|
|
f: |P x |K ----> |C
|
|
|
|
where:
|
|
- |P is the set of all plaintexts
|
|
- |K is the set of all keys
|
|
- |C is the set of all ciphertexts
|
|
|
|
For p element of |P, k of |K and c of |C, we have c = f(p,k)
|
|
with f = EE...EE = E^16 and meaning the composition of functions.
|
|
|
|
We are now going to describe each function. Since we sometimes may need
|
|
mathematics to do so, we will assume that the reader is familiar with
|
|
basic algebra ;>
|
|
|
|
rbytechain() is described by the following C function:
|
|
|
|
void rbytechain(unsigned char *block)
|
|
{
|
|
int i;
|
|
for (i = 0; i < DPA_BLOCK_SIZE; ++i)
|
|
block[i] ^= block[(i + 1) % DPA_BLOCK_SIZE];
|
|
return;
|
|
}
|
|
|
|
where:
|
|
- block is the 128b input
|
|
- DPA_BLOCK_SIZE equals 16
|
|
|
|
Such an operation on bytes is called linear mixing and its goal is to
|
|
provide the diffusion of information (according to the well known Shannon
|
|
theory). Mathematically, it's no more than a linear map between two GF(2)
|
|
vector spaces of dimension 128. Indeed, if U and V are vectors over GF(2)
|
|
representing respectively the input and the output of rbytechain() then
|
|
V = M.U where M is a 128x128 matrix over GF(2) of the linear map where
|
|
coefficients of the matrix are trivial to find. Now let's see rbitshift().
|
|
Its C version is:
|
|
|
|
void rbitshift(unsigned char *block, unsigned int shift)
|
|
{
|
|
unsigned int i;
|
|
unsigned int div;
|
|
unsigned int mod;
|
|
unsigned int rel;
|
|
unsigned char mask;
|
|
unsigned char remainder;
|
|
unsigned char sblock[DPA_BLOCK_SIZE];
|
|
|
|
if (shift)
|
|
{
|
|
mask = 0;
|
|
shift %= 128;
|
|
div = shift / 8;
|
|
mod = shift % 8;
|
|
rel = DPA_BLOCK_SIZE - div;
|
|
for (i = 0; i < mod; ++i)
|
|
mask |= (1 << i);
|
|
|
|
for (i = 0; i < DPA_BLOCK_SIZE; ++i)
|
|
{
|
|
remainder =
|
|
((block[(rel + i - 1) % DPA_BLOCK_SIZE]) & mask) << (8 - mod);
|
|
sblock[i] =
|
|
((block[(rel + i) % DPA_BLOCK_SIZE]) >> mod) | remainder;
|
|
}
|
|
}
|
|
memcpy(block, sblock, DPA_BLOCK_SIZE);
|
|
}
|
|
|
|
where:
|
|
- block is the 128b input
|
|
- DPA_BLOCK_SIZE equals 16
|
|
- shift is derived from the round subkey
|
|
|
|
Veins describes it in his paper as a key-related shifting (in fact it has
|
|
to be a key-related 'rotation' since we intend to be able to decrypt the
|
|
ciphertext ;)). A careful read of the code and several tests confirmed that
|
|
it was not erroneous (up to a bug detailed later in this paper), so we can
|
|
describe it as a linear map between two GF(2) vector spaces of dimension 128.
|
|
|
|
Indeed, if V and W are vectors over GF(2) representing respectively the
|
|
input and the output of rbitshift() then:
|
|
|
|
W = M'.V where M' is the 128x128 matrix over GF(2) of the linear
|
|
map where, unlike the previous function, coefficients of the matrix are
|
|
unknown up to a probability of 1/128 per round.
|
|
|
|
Such a function also provides diffusion of information.
|
|
|
|
Finally, the last operation S_E() is described by the C code:
|
|
|
|
void S_E (unsigned char *key, unsigned char *block, unsigned int s)
|
|
{
|
|
int i;
|
|
for (i = 0; i < DPA_BLOCK_SIZE; ++i)
|
|
block[i] = (key[i] + block[i] + s) % 256;
|
|
return;
|
|
}
|
|
|
|
where:
|
|
- block is the 128b input
|
|
- DPA_BLOCK_SIZE equals 16
|
|
- s is the shift parameter described in the previous function
|
|
- key is the round subkey
|
|
|
|
The main idea of veins' paper is the so-called "polyalphabetic substitution"
|
|
concept, whose implementation is supposed to be the S_E() C function.
|
|
Reading the code, it appears to be no more than a key mixing function over
|
|
GF(2^8).
|
|
|
|
Remark: We shall see later the importance of the mathematical operation
|
|
know as 'addition' over GF(2^8). Regarding the key scheduling, each cipher
|
|
round makes use of a 128b subkey as well as of a 32b one deriving from it
|
|
called "shift". The following pseudo code describes this operation:
|
|
|
|
skey(0) = checksum128(master_key)
|
|
for i = 0, nbr_round-2:
|
|
skey(i+1) = checksum128(skey(i))
|
|
skey(0) = skey(15)
|
|
for i = 0, nbr_round-1:
|
|
shift(nbr_round-1 - i) = hash32(skey(i))
|
|
|
|
where skey(i) is the i'th subkey.
|
|
|
|
It is not necessary to explicit the checksum128() and hash32(), the reader
|
|
just has to remind this thing: whatever the weakness there may be in those
|
|
functions, we will now consider them being true oneway hash functions
|
|
providing perfect entropy.
|
|
|
|
As a conclusion, the studied cipher is closed to being a SPN (Substitution
|
|
- Permutation Network) which is a very generic and well known construction
|
|
(AES is one for example).
|
|
|
|
|
|
--[ 4.1 - Bugs in the implementation
|
|
|
|
Although veins himself honestly recognizes that the cipher may be weak and
|
|
"strongly discourages its use" to quote him [DPA128], some people could
|
|
nevertheless decide to use it as a primitive for encryption of personal
|
|
and/or sensitive data as an alternative to 'already-cracked-by-NSA'
|
|
ciphers [NSA2007]. Unfortunately for those theoretical people, we were able
|
|
to identify a bug leading to a potentially incorrect functioning of the
|
|
cryptosystem (with a non negligible probability).
|
|
|
|
We saw earlier that the bitshift code skeleton was the following:
|
|
|
|
/* bitshift.c */
|
|
void {r,l}bitshift(unsigned char *block, unsigned int shift)
|
|
{
|
|
[...] // SysK : local vars declaration
|
|
unsigned char sblock[DPA_BLOCK_SIZE];
|
|
if (shift)
|
|
{
|
|
[...] // SysK : sblock initialization
|
|
}
|
|
memcpy(block, sblock, DPA_BLOCK_SIZE);
|
|
}
|
|
|
|
Clearly, if 'shift' is 0 then 'block' is fed with stack content! Obviously
|
|
in such a case the cryptosystem can't possibly work.
|
|
|
|
Since shift is an integer, such an event occurs with at least a theoretical
|
|
probability of 1/2^32 per round.
|
|
|
|
Now let's study the shift generation function:
|
|
|
|
/* hash32.c */
|
|
/*
|
|
* This function computes a 32 bits output out a variable length input. It is
|
|
* not important to have a nice distribution and low collisions as it is used
|
|
* on the output of checksum128() (see checksum128.c). There is a requirement
|
|
* though, the function should not consider \0 as a key terminator.
|
|
*/
|
|
|
|
unsigned long hash32(unsigned char *k, unsigned int length)
|
|
{
|
|
unsigned long h;
|
|
for (h = 0; *k && length; ++k, --length)
|
|
h = 13 * h + *k;
|
|
return (h);
|
|
}
|
|
|
|
As stated in the C code commentary, hash32() is the function which produces
|
|
the shift. Although the author is careful and admits that the output
|
|
distribution may not be completely uniform (not exactly equal probability
|
|
for each byte value to appear) it is obvious that a strong bias is not
|
|
desirable (Cf 7.3).
|
|
|
|
However what happens if the first byte pointed by k is 0 ? Since the loop
|
|
ends for k equal to 0, then h will be equal to 13 * 0 + 0 = 0. Assuming
|
|
that the underlying subkey is truly random, such an event should occur with
|
|
a probability of 1/256 (instead of 1/2^32). Since the output of hash32() is
|
|
an integer as stated in the comment, this is clearly a bug.
|
|
|
|
We could be tempted to think that this implementation failure leads to a
|
|
weakness but a short look at the code tells us that:
|
|
|
|
struct s_dpa_sub_key {
|
|
unsigned char key[DPA_KEY_SIZE];
|
|
unsigned char shift;
|
|
};
|
|
|
|
typedef struct s_dpa_sub_key DPA_SUB_KEY;
|
|
|
|
Therefore since shift is a char object, the presence of "*k &&" in the code
|
|
doesn't change the fact that the cryptosystem will fail with a probability
|
|
of 1/256 per round.
|
|
|
|
Since the bug may appear independently in each round, the probability of
|
|
failure is even greater:
|
|
|
|
p("fail") = 1 - p("ok")
|
|
= 1 - Mul( p("ok in round i") )
|
|
= 1 - (255/256)^16
|
|
= 0.0607...
|
|
|
|
where i is element of [0, (nbr_rounds - 1)]
|
|
It's not too far from 1/16 :-)
|
|
|
|
Remark: We shall see later that the special case where shift is equal to 0
|
|
is part of a general class of weak keys potentially allowing an attacker to
|
|
break the cryptosystem.
|
|
|
|
Hunting weaknesses and bugs in the implementation of cryptographic
|
|
primitives is the common job of some reverse engineers since it sometimes
|
|
allows to break implementations of algorithms which are believed to be
|
|
theoretically secure. While those flaws mostly concern asymmetric
|
|
primitives of digital signature or key negotiation/generation, it can also
|
|
apply in some very specific cases to the block cipher world.
|
|
|
|
From now, we will consider the annoying bug in bitshift() fixed.
|
|
|
|
|
|
--[ 4.2 - Weaknesses in the design
|
|
|
|
When designing a block cipher, a cryptographer has to be very careful about
|
|
every details of the algorithm. In the following section, we describe
|
|
several design mistakes and explain why in some cases, it can reduce the
|
|
security of the cipher.
|
|
|
|
a) We saw earlier that the E() function was applied to each round. However
|
|
such a construction is not perfect regarding the first round. Since
|
|
rbytechain() is a linear mixing operating not involving key material, it
|
|
shouldn't be used as the first operation on the input buffer since its
|
|
effect on it can be completely canceled. Therefore, if a cryptanalyst wants
|
|
to attack the bitshift() component of the first round, he just have to
|
|
apply lbytechain() (the rbytechain() inverse function) to the input vector.
|
|
It would thus have been a good idea to put a key mixing as the first
|
|
operation.
|
|
|
|
b) The rbitshift() operation only need the 7 first bits of the shift
|
|
character whereas the S_E() uses all of them. It is also generally
|
|
considered a bad idea to use the same key material for several operations.
|
|
|
|
c) If for some reason, the attacker is able to leak the second (not the
|
|
first) subkey then it implies the compromising of all the key material. Of
|
|
course the master key will remain unknown because of the onewayness of
|
|
checksum128() however we do not need to recover it in order to encrypt
|
|
and/or decrypt datas.
|
|
|
|
d) In the bitshift() function, a loop is particularly interesting:
|
|
|
|
for (i = 0; i < mod; ++i)
|
|
mask |= (1 << i);
|
|
|
|
What is interesting is that the time execution of the loop is dependent of
|
|
"mod" which is derived from the shift. Therefore we conclude that this loop
|
|
probably allows a side channel attack against the cipher. Thanks to X for
|
|
having pointed this out ;> In the computer security area, it's well known
|
|
that a single tiny mistake can lead to the total compromising of an
|
|
information system. In cryptography, the same rules apply.
|
|
|
|
|
|
--[ 5 - Breaking the linearized version
|
|
|
|
Even if we regret the non justification of addition operation employment,
|
|
it is not the worst choice in itself. What would have happen if the key
|
|
mixing had been done with a xor operation over GF(2^8) instead as it is the
|
|
case in DES or AES for example?
|
|
|
|
To measure the importance of algebraic consideration in the security of a
|
|
block cipher, let's play a little bit with a linearized version of the
|
|
cipher. That is to say that we replace the S_E() function with the
|
|
following S_E2() where :
|
|
|
|
void S_E2 (unsigned char *key, unsigned char *block, unsigned int s)
|
|
{
|
|
int i;
|
|
for (i = 0; i < DPA_BLOCK_SIZE; ++i)
|
|
block[i] = (key[i] ^ block[i] ^ s) % 256; [1]
|
|
// + is replaced by xor
|
|
return;
|
|
}
|
|
|
|
If X, Y and K are vectors over GF(2^8) representing respectively the input,
|
|
the output of S_E2() and the round key material then Y = X xor K.
|
|
|
|
Remark: K = sK xor shift. We use K for simplification purpose.
|
|
|
|
Now considering the full round we have :
|
|
|
|
V = M.U [a] (rbytechain)
|
|
W = M'.V [b] (rbitshift)
|
|
Y = W xor K [c] (S_E2)
|
|
|
|
Linear algebra allows the composition of applications rbytechain() and
|
|
rbitshift() since the dimensions of M and M' match but W in [b] is a vector
|
|
over GF(2) whereas W in [c] is clearly over GF(2^8). However, due to the
|
|
use of XOR in [c], Y, W and K can also be seen as vectors over GF(2).
|
|
Therefore, S_E2() is a GF(2) affine map between two vector spaces of
|
|
dimension 128.
|
|
|
|
We then have:
|
|
|
|
Y = M'.M.U xor K
|
|
|
|
The use of differential cryptanalysis will help us to get rid of the key.
|
|
Let's consider couples (U0,Y0 = E(U0)) and (U1,Y1 = E(U1)) then:
|
|
|
|
DELTA(Y) = Y0 xor Y1
|
|
= (M'.M.U0 xor K) xor (M'.M.U1 xor K)
|
|
= (M'.M.U0 xor M'.M.U1) xor K xor K (commutativity &
|
|
associativity of xor)
|
|
= (M'.M).(U0 xor U1) (distributivity)
|
|
= (M'.M).DELTA(U)
|
|
|
|
Such a result shows us that whatever sK and shift are, there is always a
|
|
linear map linking an input differential to the corresponding output
|
|
differential.
|
|
|
|
The generalization to the 16 rounds using matrix multiplication is obvious.
|
|
Therefore we have proved that there exists a 128x128 matrix Mf over GF(2)
|
|
such as DELTA(Y) = Mf.DELTA(X) for the linearized version of the cipher.
|
|
|
|
Then assuming we know one couple (U0,Y0) and Mf, we can encrypt any input U.
|
|
Indeed, Y xor Y0 = Mf.(U xor U0) therefore Y = (Mf.(U xor U0)) xor Y0.
|
|
|
|
Remark 1: The attack doesn't give us the knowledge of subkeys and shifts
|
|
but such a thing is useless. The goal of an attacker is not the key in
|
|
itself but rather the ability of encrypting/decrypting a set of
|
|
plaintexts/ciphertexts. Furthermore, considering the key scheduling
|
|
operation, if we really needed to recover the master key, it would be quite
|
|
a pain in the ass considering the fact that checksum128() is a one way
|
|
function ;-)
|
|
|
|
Remark 2: Obviously in order to decrypt any output Y we need to calculate
|
|
Mf^-1 which is the inverse matrix of Mf. This is somewhat more interesting
|
|
isn't it ? :-)
|
|
|
|
Because of rbitshift(), we are unable to determine using matrix
|
|
multiplications the coefficients of Mf. An exhaustive search is of course
|
|
impossible because of the huge complexity (2^16384) however, finding them
|
|
is equivalent to solving 128 systems (1 system per row of Mf) of 128
|
|
variables (1 variable per column) in GF(2). To build such a system, we need
|
|
128 couples of (cleartext,ciphertext). The described attack was implemented
|
|
using the nice NTL library ([SHOUP]) and can be found in annexe A of this
|
|
paper.
|
|
|
|
$ g++ break_linear.cpp bitshift.o bytechain.o key.c hash32.o checksum128.o
|
|
-o break_linear -lntl -lcrypto -I include
|
|
$ ./break_linear
|
|
[+] Generating the plaintexts / ciphertexts
|
|
[+] NTL stuff !
|
|
[+] Calculation of Mf
|
|
[+] Let's make a test !
|
|
[+] Well done boy :>
|
|
|
|
Remark: Sometimes NTL detects a linear relation between chosen inputs
|
|
(DELTA_X) and will then refuse to work. Indeed, in order to solve the 128
|
|
systems, we need a situation where every equations are independent. If it's
|
|
not the case, then obviously det(M) is equal to 0 (with probability 1/2).
|
|
Since inputs are randomly generated, just try again until it works :-)
|
|
|
|
$ ./break_linear
|
|
[+] Generating the plaintexts / ciphertexts
|
|
[+] NTL stuff !
|
|
det(M) = 0
|
|
|
|
As a conclusion we saw that the linearity over GF(2) of the xor operation
|
|
allowed us to write an affine relation between two elements of GF(2)^128 in
|
|
the S_E2() function and then to easily break the linearized version using a
|
|
128 known plaintext attack. The use of non linearity is crucial in the
|
|
design. Fortunately for DPA-128, Veins chose the addition modulo 256 as the
|
|
key mixer which is naturally non linear over GF(2).
|
|
|
|
|
|
--[ 6 - On the non linearity of addition modulo n over GF(2)
|
|
|
|
The bitshift() and bytechain() functions can be described using matrix over
|
|
GF(2) therefore it is interesting to use this field for algebraic
|
|
calculations.
|
|
|
|
The difference between addition and xor laws in GF(2^n) lies in the carry
|
|
propagation:
|
|
|
|
w(i) + k(i) = w(i) xor k(i) xor carry(i)
|
|
where w(i), k(i) and carry(i) are elements of GF(2).
|
|
|
|
We note w(i) as the i'th bit of w and will keep this notation until the end.
|
|
carry(i), written c(i) for simplification purpose, is defined recursively:
|
|
|
|
c(i+1) = w(i).k(i) xor w(i).c(i) xor k(i).c(i)
|
|
with c(0) = 0
|
|
|
|
Using this notation, it would thus be possible to determine a set of
|
|
relations over GF(2) between input/output bits which the attacker controls
|
|
using a known plaintext attack and the subkey bits (which the attacker
|
|
tries to guess).
|
|
|
|
However, recovering the subkey bits won't be that easy. Indeed, to determine
|
|
them, we need to get rid of the carries replacing them by multivariate
|
|
polynomials were unknowns are monomials of huge order.
|
|
|
|
Remark 1: Because of the recursivity of the carry, the order of monomials
|
|
grows up as the number of input bits per round as well as the number of
|
|
rounds increases.
|
|
|
|
Remark 2: Obviously we can not use intermediary input/output bits in our
|
|
equations. This is because unlike the subkey bits, they are dependent of the
|
|
input.
|
|
|
|
We are thus able to express the cryptosystem as a multivariate polynomial
|
|
system over GF(2). Solving such a system is NP-hard. There exists methods
|
|
for system of reasonable order like groebner basis and relinearization
|
|
techniques but the order of this system seems to be far too huge.
|
|
|
|
However for a particular set of keys, the so-called weak keys, it is
|
|
possible to determine the subkeys quite easily getting rid of the complexity
|
|
introduced by the carry.
|
|
|
|
|
|
--[ 7 - Exploiting weak keys
|
|
|
|
Let's first define a weak key. According to wikipedia:
|
|
|
|
"In cryptography, a weak key is a key which when used with a specific
|
|
cipher, makes the cipher behave in some undesirable way. Weak keys usually
|
|
represent a very small fraction of the overall keyspace, which usually
|
|
means that if one generates a random key to encrypt a message weak keys are
|
|
very unlikely to give rise to a security problem. Nevertheless, it is
|
|
considered desirable for a cipher to have no weak keys."
|
|
|
|
Actually we identified a particular subset |W of |K allowing us to deal
|
|
quite easily with the carry problem. A key "k" is part of |W if and only if
|
|
for each round the shift parameter is a multiple of 8. The reader should
|
|
understand why later.
|
|
|
|
We will first present the attack on a reduced version of DPA for simplicity
|
|
purpose and generalize it later to the full version.
|
|
|
|
|
|
--[ 7.1 - Playing with a toy cipher
|
|
|
|
Our toy cipher is a 2 rounds DPA. Moreover, the cipher takes as input 4*8
|
|
bits instead of 16*8 = 128 bits which means that DPA_BLOCK_SIZE = 4. We
|
|
also make a little modification in bytechain() operation. Let's remember
|
|
the bytechain() function:
|
|
|
|
void rbytechain(unsigned char *block)
|
|
{
|
|
int i;
|
|
for (i = 0; i < DPA_BLOCK_SIZE; ++i)
|
|
block[i] ^= block[(i + 1) % DPA_BLOCK_SIZE];
|
|
return;
|
|
}
|
|
|
|
Since block is both input AND output of the function then we have for
|
|
DPA_BLOCK_SIZE = 4:
|
|
|
|
V(0) = U(0) xor U(1)
|
|
V(1) = U(1) xor U(2)
|
|
V(2) = U(2) xor U(3)
|
|
V(3) = U(3) xor V(0) = U(0) xor U(1) xor U(3)
|
|
|
|
Where V(x) is the x'th byte element.
|
|
|
|
Thus with our modification:
|
|
|
|
V(0) = U(0) xor U(1)
|
|
V(1) = U(1) xor U(2)
|
|
V(2) = U(2) xor U(3)
|
|
V(3) = U(3) xor U(0)
|
|
|
|
Regarding the mathematical notation (pay your ascii !@#):
|
|
|
|
- U,V,W,Y vector notation of section 5 remains.
|
|
- Xj(i) is the i'th bit of vector Xj where j is j'th round.
|
|
- U0 vector is equivalent to P where P is a plaintext.
|
|
- m is the shift of round 0
|
|
- n is the shift of round 1
|
|
- xor will be written '+' since calculation is done in GF(2)
|
|
- All calculation of subscript will be done in the ring ZZ_32
|
|
|
|
How did we choose |W? Using algebra in GF(2) implies to deal with the carry.
|
|
However, if k is a weak key (part of |W), then we can manage the calculation
|
|
so that it's not painful anymore.
|
|
|
|
Let i be the lowest bit of any input byte. Therefore for each i part of the
|
|
set {0,8,16,24} we have:
|
|
|
|
u0(i) = p(i)
|
|
v0(i) = p(i) + p(i+8)
|
|
w0(i+m) = v0(i)
|
|
y0(i) = w0(i) + k0(i) + C0(i)
|
|
y0(i+m) = w0(i+m) + k0(i+m) + C0(i+m)
|
|
y0(i+m) = p(i) + p(i+8) + k0(i+m) + C0(i+m) /* carry(0) = 0 */
|
|
y0(i+m) = p(i) + p(i+8) + k0(i+m)
|
|
|
|
u1(i) = y0(i)
|
|
v1(i) = y0(i) + y0(i+8)
|
|
w1(i+n) = v1(i)
|
|
y1(i) = w1(i) + k1(i) + C1(i)
|
|
y1(i+n) = w1(i+n) + k1(i+n) + C1(i+n)
|
|
y1(i+n) = y0(i) + y0(i+8) + k1(i+n) + C1(i+n)
|
|
y1(i+n+m) = y0(i+m) + y0(i+m+8) + k1(i+n+m) + C1(i+n+m) /* carry(0) = 0 */
|
|
y1(i+n+m) = p(i) + p(i+8) + k0(i+m) + p(i+8) + p(i+16)
|
|
+ k0(i+m+8) + k1(i+n+m)
|
|
y1(i+n+m) = p(i) + k0(i+m) + p(i+16) + k0(i+m+8) + k1(i+n+m)
|
|
|
|
As stated before, i is part of the set {0,8,16,24} so we can write:
|
|
|
|
y1(n+m) = p(0) + k0(m) + p(16) + k0(m+8) + k1(n+m)
|
|
y1(8+n+m) = p(8) + k0(8+m) + p(24) + k0(m+16) + k1(8+n+m)
|
|
y1(16+n+m) = p(16) + k0(16+m) + p(0) + k0(m+24) + k1(16+n+m)
|
|
y1(24+n+m) = p(24) + k0(24+m) + p(8) + k0(m) + k1(24+n+m)
|
|
|
|
In the case of a known plaintext attack, the attacker has the knowledge of
|
|
a set of couples (P,Y1). Therefore considering the previous system, the
|
|
lowest bit of K0 and K1 vectors are the unknowns. Here we have a system
|
|
which is clearly underdefined since it is composed of 4 equations and
|
|
4*2 unknowns. It will give us the relations between each lowest bit of Y
|
|
and the lowest bits of K0 and K1.
|
|
|
|
Remark 1: n,m are unknown. A trivial approach is to determine them which
|
|
costs a complexity of (2^4)^2 = 2^8. Although it may seem a good idea,
|
|
let's recall the reader that we are considering a round reduced cipher!
|
|
Indeed, applying the same idea to the full 16 rounds would cost us
|
|
(2^4)^16 = 2^64! Such a complexity is a pain in the ass even nowadays :-)
|
|
|
|
A much better approach is to guess (n+m) as it costs 2^4 what ever the
|
|
number of rounds. It gives us the opportunity to write relations between
|
|
some input and output bits. We do not need to know exactly m and n. The
|
|
knowledge of the intermediate variables k0(x+m) and k1(y+n+m) is
|
|
sufficient.
|
|
|
|
Remark 2: An underdefined system brings several solutions. We are
|
|
thus able to choose arbitrarily 4 variables thus fixing them with values of
|
|
our choice. Of course we have to choose so that we are able to solve the
|
|
system with remaining variables. For example taking k0(m), k0(m+8) and
|
|
k1(n+m) together is not fine because of the first equation. However, fixing
|
|
all the k0(x+m) may be a good idea as it automatically gives the k1(y+n+m)
|
|
corresponding ones.
|
|
|
|
Now let's go further. Let i be part of the set {1,9,17,25}. We can write:
|
|
|
|
u0(i) = p(i)
|
|
v0(i) = p(i) + p(i+8)
|
|
w0(i+m) = v0(i)
|
|
y0(i) = w0(i) + k0(i) + w0(i-1)*k0(i-1)
|
|
y0(i+m) = w0(i+m) + k0(i+m) + w0(i+m-1)*k0(i+m-1)
|
|
y0(i+m) = p(i) + p(i+8) + k0(i+m) + w0(i+m-1)*k0(i+m-1)
|
|
y0(i+m) = p(i) + p(i+8) + k0(i+m) + (p(i-1) + p(i-1+8))*k0(i+m-1)
|
|
|
|
u1(i) = y0(i)
|
|
v1(i) = y0(i) + y0(i+8)
|
|
w1(i+n) = v1(i)
|
|
y1(i) = w1(i) + k1(i) + C1(i)
|
|
y1(i) = w1(i) + k1(i) + w1(i-1)*k1(i-1)
|
|
y1(i+n) = w1(i+n) + k1(i+n) + w1(i-1+n)*k1(i-1+n)
|
|
y1(i+n) = y0(i) + y0(i+8) + k1(i+n) + (y0(i-1) + y0(i+8-1)) * k1(i-1+n)
|
|
|
|
y1(i+n+m) = y0(i+m) + y0(i+m+8) + k1(i+m+n)
|
|
+ (y0(i+m-1) + y0(i+m+8-1)) * k1(i+m+n-1)
|
|
|
|
y1(i+n+m) = p(i) + p(i+8) + k0(i+m) + (p(i-1) + p(i-1+8)) * k0(i+m-1)
|
|
+ p(i+8) + p(i+16) + k0(i+m+8)
|
|
+ (p(i+8-1) + p(i-1+16)) * k0(i+m-1+8)
|
|
+ k1(i+n+m)
|
|
+ k1(i+m+n-1) * [p(i-1) + p(i+8-1) + k0(i+m-1)]
|
|
+ k1(i+m+n-1) * [p(i-1+8) + p(i+16-1) + k0(i+m-1+8)]
|
|
|
|
y1(i+n+m) = p(i) + k0(i+m) + (p(i-1) + p(i-1+8)) * k0(i+m-1)
|
|
+ p(i+16) + k0(i+m+8) + (p(i+8-1) + p(i-1+16)) * k0(i+m-1+8)
|
|
+ k1(i+n+m)
|
|
+ k1(i+m+n-1)*[p(i-1) + k0(i+m-1)]
|
|
+ k1(i+m+n-1)*[p(i-1+16) + k0(i+m-1+8)]
|
|
|
|
Thanks to the previous system resolution, we have the knowledge of
|
|
k0(i+m+n-1+x) and k1(i+m-1+y) variables. Therefore, we can reduce the
|
|
previous equation to:
|
|
|
|
A(i) = k0(i+m) + k0(i+m+8) + k1(i+n+m) (alpha)
|
|
|
|
where A(i) is a known value for the attacker.
|
|
|
|
Remark 1: This equation represents the same system as found in case of i
|
|
being the lowest bit! Therefore all previous remarks remain.
|
|
|
|
Remark 2: If we hadn't have the knowledge of k0(i+m+n-1+x) and k1(i+m-1+y)
|
|
bits then the number of variables would have grown seriously. Moreover we
|
|
would have had to deal with some degree 2 monomials :-/.
|
|
|
|
We can thus conjecture that the equation alpha will remain true for each i
|
|
part of {a,a+8,a+16,a+24} where 0 <= a < 8.
|
|
|
|
|
|
--[ 7.2 - Generalization and expected complexity
|
|
|
|
Let's deal with the real bytechain() function now.
|
|
As stated before and for DPA_BLOCK_SIZE = 4 we have:
|
|
|
|
V(0) = U(0) xor U(1)
|
|
V(1) = U(1) xor U(2)
|
|
V(2) = U(2) xor U(3)
|
|
V(3) = U(0) xor U(1) xor U(3)
|
|
|
|
This is clearly troublesome as the last byte V(3) is NOT calculated like
|
|
V(0), V(1) and V(2). Because of the rotations involved, we wont be able to
|
|
know when the bit manipulated is part of V(3) or not.
|
|
|
|
Therefore, we have to use a general formula:
|
|
|
|
V(i) = U(i) + U(i+1) + a(i).U(i+2)
|
|
where a(i) = 1 for i = 24 to 31
|
|
|
|
For i part of {0,8,16,24} we have:
|
|
|
|
u0(i) = p(i)
|
|
v0(i) = p(i) + p(i+8) + a0(i).p(i+16)
|
|
w0(i+m) = v0(i)
|
|
y0(i) = w0(i) + k0(i) + C0(i)
|
|
y0(i+m) = w0(i+m) + k0(i+m) + C0(i+m)
|
|
y0(i+m) = p(i) + p(i+8) + a0(i).p(i+16) + k0(i+m) + C0(i+m) /*carry(0) = 0*/
|
|
y0(i+m) = p(i) + p(i+8) + a0(i).p(i+16) + k0(i+m)
|
|
|
|
So in the second round:
|
|
|
|
u1(i) = y0(i)
|
|
v1(i) = y0(i) + y0(i+8) + a1(i).y0(i+16)
|
|
w1(i+n) = v1(i)
|
|
y1(i) = w1(i) + k1(i) + C1(i)
|
|
y1(i+n) = w1(i+n) + k1(i+n) + C1(i+n)
|
|
y1(i+n) = y0(i) + y0(i+8) + a1(i).y0(i+16) + k1(i+n) + C1(i+n)
|
|
y1(i+n+m) = y0(i+m) + y0(i+m+8) + a1(i+m).y0(i+m+16) + k1(i+n+m)
|
|
|
|
y1(i+n+m) = p(i) + p(i+8) + a0(i).p(i+16) + k0(i+m)
|
|
+ p(i+8) + p(i+16) + a0(i).p(i+24) + k0(i+m+8)
|
|
+ a1(i+m).[p(i+16) + p(i+24) + a0(i).p(i) + k0(i+m+16)] + k1(i+n+m)
|
|
|
|
y1(i+n+m) = p(i) + a0(i).p(i+16) + k0(i+m)
|
|
+ p(i+16) + a0(i).p(i+24) + k0(i+m+8)
|
|
+ a1(i+m).[p(i+16) + p(i+24) + a0(i).p(i) + k0(i+m+16)] + k1(i+n+m)
|
|
|
|
a0(i) is not a problem since we know it. This is coherent with the fact
|
|
that the first operation of the cipher is rbytechain() which is invertible
|
|
for the attacker. However, the problem lies in the a1(i+m) variables.
|
|
|
|
Guessing a1(i+m) is out of question as it would cost us a complexity of
|
|
(2^4)^15 = 2^60 for the 16 rounds! The solution is to consider a1(i+m) as
|
|
an other set of 4 variables. We can also add the equation to our system:
|
|
|
|
a1(m) + a1(m+8) + a1(m+16) + a1(m+24) = 1
|
|
This equation will remain true for other bits.
|
|
|
|
So what is the global complexity? Obviously with DPA_BLOCK_SIZE = 16 each
|
|
system is composed of 16+1 equations of 16+1 variables (we fixed the
|
|
others). Therefore, the complexity of the resolution is:
|
|
log(17^3) / log(2) ~ 2^13.
|
|
|
|
We will solve 8 systems since there are 8 bits per byte. Thus the global
|
|
complexity is around (2^13)*8 = 2^16.
|
|
|
|
Remark: We didn't take into account the calculation of equation as it is
|
|
assumed to be determined using a formal calculation program such as pari-gp
|
|
or magma.
|
|
|
|
|
|
--[ 7.3 - Cardinality of |W
|
|
|
|
What is the probability of choosing a weak key? We have seen that our weak
|
|
key criterion is that for each round, the rotation parameter needs to be
|
|
multiple of 8. Obviously, it happens with 16 / 128 = 1/8 theoretical
|
|
probability per round. Since we consider subkeys being random, the
|
|
generation of rotation parameters are independent which means that the
|
|
overall probability is (1/16)^16 = 1/2^64.
|
|
|
|
Although a probability of 1/2^64 means a (huge) set of 2^64 weak keys, in
|
|
the real life, there are very few chances to choose one of them. In fact,
|
|
you probably have much more chances to win lottery ;) However, two facts
|
|
must be noticed:
|
|
|
|
- We presented one set of weak keys but there be some more!
|
|
- We illustrated an other weakness in the conception of DPA-128
|
|
|
|
Remark: A probability of 1/8 per round is completely theoretic as it
|
|
supposes a uniform distribution of hash32() output. Considering the extreme
|
|
simplicity of the hash32() function, it wouldn't be too surprising to be
|
|
different in practice. Therefore we made a short test to compute the real
|
|
probability (Annexe B).
|
|
|
|
$ gcc test.hash32.c checksum128.o hash32.o -o test.hash32 -O3
|
|
-fomit-frame-pointer
|
|
$ time ./test.hash32
|
|
[+] Probability is 0.125204
|
|
|
|
real 0m14.654s
|
|
user 0m14.649s
|
|
sys 0m0.000s
|
|
|
|
$ gp -q
|
|
? (1/0.125204) ^ 16
|
|
274226068900783.2739747241633
|
|
? log(274226068900783.2739747241633) / log(2)
|
|
47.96235905375676878381741198
|
|
?
|
|
|
|
This result tells us clearly that the probability of shift being multiple
|
|
of 8 is around 1/2^2.99 ~ 1/8 per round which is assimilated to the
|
|
theoretical one since the difference is too small to be significant. In
|
|
order to improve the measure, we used checksum128() as an input of
|
|
hash32(). Furthermore, we also tried to test hash32() without the "*k &&"
|
|
bug mentioned earlier. Both tests gave similar results which means that the
|
|
bug is not important in practice and that checksum128() doesn't seem to be
|
|
particularly skewed. This is a good point for DPA! :-D
|
|
|
|
|
|
--[ 8 - Breaking DPA-based unkeyed hash function
|
|
|
|
In his paper, Veins also explains how a hash function can be built out of
|
|
DPA. We will analyze the proposed scheme and will show how to completely
|
|
break it.
|
|
|
|
|
|
--[ 8.1 - Introduction to hash functions
|
|
|
|
Quoting once again the excellent HAC [MenVan]:
|
|
"A hash function is a function h which has, as a minimum, the following two
|
|
properties:
|
|
|
|
1. compression - h maps an input x of arbitrary finit bitlength, to an
|
|
output h(x) of fixed bitlength n.
|
|
2. ease of computation - given h and an input x, h(x) is easy to compute.
|
|
|
|
In cryptography there are essentially two families of hash functions:
|
|
|
|
1. The MAC (Message Authentication Codes). They are keyed ones and provides
|
|
both authentication (of source) and integrity of messages.
|
|
2. The MDC (Modification Detection Code), sometimes referred as MIC. They
|
|
are unkeyed and only provide integrity. We will focus on this kind of
|
|
functions. When designing his hash function, the cryptographer generally
|
|
wants it to satisfy the three properties:
|
|
|
|
- preimage resistance. For any y, it should not be possible (that is to say
|
|
computationally infeasible) to find an x such as h(x) = y. Such a property
|
|
implies that the function has to be non invertible.
|
|
- 2nd preimage resistance. For any x, it should not be possible to find an
|
|
x' such as h(x) = h(x') when x and x' are different.
|
|
- collision resistance. It should not be possible to find an x and an x'
|
|
(with x different of x') such that h(x) = h(x').
|
|
|
|
Remark 1: Properties 1 and 2 and essentials when dealing with binary
|
|
integrity.
|
|
|
|
Remark 2: The published attacks on MD5 and SHA-0/SHA-1 were dealing with the
|
|
third property. While it is true that finding collisions on a hash function
|
|
is enough for the crypto community to consider it insecure (and sometimes
|
|
leads to a new standard [NIST2007]), for most of usages it still remains
|
|
sufficient.
|
|
|
|
There are many way to design an MDC function. Some functions are based on
|
|
MD4 function such as MD5 or SHA* functions which heavily rely on boolean
|
|
algebra and operations in GF(2^32), some are based on NP problems such as
|
|
RSA and finally some others are block cipher based.
|
|
|
|
The third category is particularly interesting since the security of the
|
|
hash function can be reduced to the one of the underlying block cipher.
|
|
This is of course only true with a good design.
|
|
|
|
|
|
--[ 8.2 - DPAsum() algorithm
|
|
|
|
The DPA-based hash function lies in the functions DPA_sum() and
|
|
DPA_sum_write_to_file() which can be found respectively in file sum.c and
|
|
data.c.
|
|
|
|
Let's detail them a little bit using pseudo code:
|
|
|
|
Let M be the message to hash, let M(i) be the i'th 128b block message.
|
|
Let N = DPA_BLOCK_SIZE * i + j be the size in bytes of the message where i
|
|
and j are integers such as i = N / DPA_BLOCK_SIZE and 0 <= j < 16.
|
|
Let C be an array of 128 bits elements were intermediary results of hash
|
|
calculation are stored. The last element of this array is the hash of the
|
|
message.
|
|
|
|
func DPA_sum(K0,M,C):
|
|
|
|
K0 = key("deadbeef");
|
|
IV = "0123456789abcdef";
|
|
|
|
C(0) = E( IV , K0);
|
|
C(1) = E( IV xor M(0) , K0);
|
|
|
|
FOR a = 1 to i-1:
|
|
C(a+1) = E( C(a) xor M(a) , K0);
|
|
|
|
if j == 0:
|
|
C(i+1) = E( C(i) xor 000...000 , K0)
|
|
else
|
|
C(i+1) = E( C(i) xor PAD( M(i) );
|
|
C(i+2) = E( C(i+1) xor 000...00S , K0) /* s = 16-j */
|
|
return;
|
|
|
|
func DPA_sum_write_to_file(C, file):
|
|
|
|
write(file,C(last_element));
|
|
return;
|
|
|
|
|
|
--[ 8.3 - Weaknesses in the design/implementation
|
|
|
|
We noticed several implementation mistakes in the code:
|
|
|
|
a) Using the algorithm of hash calculation, every element of array C is
|
|
defined recursively however C(0) is never used in calculation. This doesn't
|
|
impact security in itself but is somewhat strange and could let us think
|
|
that the function was not designed before being programmed.
|
|
|
|
b) When the size of M is not a multiple of DPA_BLOCK_SIZE (j is not equal
|
|
to 0) then the algorithms calculates the last element using a xor mask where
|
|
the last byte gives information on the size of the original message.
|
|
However, what is included in the padding is not the size of the message in
|
|
itself but rather the size of padding.
|
|
|
|
If we take the example of the well known Merkle-Damgard construction on
|
|
which are based MD{4,5} and SHA-{0,1} functions, then the length of the
|
|
message was initially appended in order to prevent collisions attacks for
|
|
messages of different sizes. Therefore in the DPASum() case, appending j
|
|
to the message is not sufficient as it would be possible to find collisions
|
|
for messages of size (DPA_BLOCK_SIZE*a + j) and (DPA_BLOCK_SIZE*b + j) were
|
|
obviously a and b are different.
|
|
|
|
Remark: The fact that the IV and the master key are initially fixed is not
|
|
a problem in itself since we are dealing with MDC here.
|
|
|
|
|
|
--[ 8.4 - A (2nd) preimage attack
|
|
|
|
Because of the hash function construction properties, being given a
|
|
message X, it is trivial to create a message X' such as h(X) = h(X'). This
|
|
is called building a 2nd preimage attack.
|
|
|
|
We built a quick & dirty program to illustrate it (Annexe C). It takes a
|
|
32 bytes message as input and produces an other 32 bytes one with the same
|
|
hash:
|
|
|
|
$ cat to.hack | hexdump -C
|
|
00000000 58 41 4c 4b 58 43 4c 4b 53 44 4c 46 4b 53 44 46 |XALKXCLKSDLFKSDF|
|
|
00000010 58 4c 4b 58 43 4c 4b 53 44 4c 46 4b 53 44 46 0a |XLKXCLKSDLFKSDF.|
|
|
00000020
|
|
$ ./dpa -s to.hack
|
|
6327b5becaab3e5c61a00430e375b734
|
|
$ gcc break_hash.c *.o -o break_hash -I ./include
|
|
$ ./break_hash to.hack > hacked
|
|
$ ./dpa -s hacked
|
|
6327b5becaab3e5c61a00430e375b734
|
|
$ cat hacked | hexdump -C
|
|
00000000 43 4f 4d 50 4c 45 54 45 4c 59 42 52 4f 4b 45 4e |COMPLETELYBROKEN|
|
|
00000010 3e bf de 93 d7 17 7e 1d 2a c7 c6 70 66 bb eb a3 |>.....~.*..pf...|
|
|
00000020
|
|
|
|
Nice isn't it ? :-) We were able to write arbitrary data in the first 16
|
|
bytes and then to calculate the next 16 bytes so that the 'hacked' file had
|
|
the exact same hash. But how did we do such an evil thing?
|
|
|
|
Assuming the size of both messages is 32 bytes then:
|
|
|
|
h(Mi) = E(E(Mi(0) xor IV,K0) xor Mi(1),K0)
|
|
|
|
Therefore, it is obvious that:
|
|
|
|
h(M1) = h(M2) is equivalent to
|
|
E(E(M1(0) xor IV,K0) xor M1(1),K0) = E(E(M2(0) xor IV,K0) xor M2(1),K0)
|
|
|
|
Which can be reduced to:
|
|
E(M1(0) xor IV,K0) xor M1(1) = E(M2(0) xor IV,K0) xor M2(1)
|
|
|
|
Which therefore gives us:
|
|
M2(1) = E(M2(0) xor IV,K0) xor E(M1(0) xor IV,K0) xor M1(1) [A]
|
|
|
|
Since M1,IV,K0 are known parameters then for a chosen M2(0), [A] gives us
|
|
M2(1) so that h(M1) = h(M2).
|
|
|
|
Remark 1: Actually such a result can be easily generalized to n bytes
|
|
messages. In particular, the attacker can put anything in his message and
|
|
"correct it" using the last blocks (if n >= 32).
|
|
|
|
Remark 2: Of course building a preimage attack is also very easy. We
|
|
mentioned previously that we had for a 32 bytes message:
|
|
h(Mi) = E(E(Mi(0) xor IV,K0) xor Mi(1),K0)
|
|
|
|
Therefore, Mi(1) = E^-1(h(Mi),K0) xor E(Mi(0) xor IV,K0) [B]
|
|
|
|
The [B] equation tells us how to generate Mi(1) so that we have h(Mi) in
|
|
output. It doesn't seem to be really a one way hash function does it ? ;-)
|
|
Building a hash function out of a block cipher is a well known problem in
|
|
cryptography which doesn't only involve the security of the underlying
|
|
block cipher. One should rely on one of the many well known and heavily
|
|
analyzed algorithms for this purpose instead of trying to design one.
|
|
|
|
|
|
--[ 9 - Conclusion
|
|
|
|
We put into evidence some weaknesses of the cipher and were also able to
|
|
totally break the proposed hash function built out of DPA. In his paper,
|
|
Veins implicitly set the bases of a discussion to which we wish to deliver
|
|
our opinion. We claim that it is necessary to understand properly
|
|
cryptology. The goal of this paper wasn't to illustrate anything else but
|
|
that fact. Being hacker or not, paranoid or simply careful, the rule is the
|
|
same for everybody in this domain: nothing should be done without reflexion.
|
|
|
|
|
|
--[ 10 - Greetings
|
|
|
|
#TF crypto dudes for friendly and smart discussions and specially X for
|
|
giving me a lot of hints. I learned a lot from you guys :-)
|
|
#K40r1 friends for years of fun ;-) Hi all :)
|
|
Finally but not least my GF and her kindness which is her prime
|
|
characteristic :> (However if she finds out the joke in the last sentence
|
|
I may die :|)
|
|
|
|
|
|
--[ 11 - Bibliography
|
|
|
|
[DPA128] A Polyalphabetic Substitution Cipher, Veins, Phrack 62.
|
|
[FakeP63] Keeping 0day Safe, m1lt0n, Phrack(.nl) 63.
|
|
[MenVan] Handbook of Applied Cryptography, Menezes, Oorschot & Vanstone.
|
|
[Knud99] Correlation in RC6, L. Knudsen & W. Meier.
|
|
[CrypTo] Two balls ownz one, http://fr.wikipedia.org/wiki/Cryptorchidie
|
|
[Vaud] An Experiment on DES - Statistical Cryptanalysis, S. Vaudenay.
|
|
[Ryabko] Adaptative chi-square test and its application to some
|
|
cryptographic problems, B. Ryabko.
|
|
[CourtAlg] How Fast can be Algebraic Attacks on Block Ciphers ?, Courtois.
|
|
[BiSha90] Differential Cryptanalysis of DES-like Cryptosystems, E. Biham
|
|
& A. Shamir, Advances in Cryptology - CRYPTO 1990.
|
|
[Matsui92] A new method for known plaintext attack of FEAL cipher, Matsui
|
|
& A. Yamagishi, EUROCRYPT 1992.
|
|
[NSA2007] Just kidding ;-)
|
|
[SHOUP] NTL library, V. Shoup, http://www.shoup.net/ntl/
|
|
[NIST2007] NIST, http://www.csrc.nist.gov/pki/HashWorkshop/index.html, 2007
|
|
|
|
|
|
--[ Annexe A - Breaking the linearised version
|
|
|
|
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
|
|
/* Crappy C/C++ source. I'm in a hurry for the paper redaction so don't
|
|
* blame me toooooo much please ! :> */
|
|
|
|
#include <iostream>
|
|
#include <fstream>
|
|
#include <openssl/rc4.h>
|
|
#include <NTL/ZZ.h>
|
|
#include <NTL/ZZ_p.h>
|
|
#include <NTL/mat_GF2.h>
|
|
#include <NTL/vec_GF2.h>
|
|
#include <NTL/GF2E.h>
|
|
#include <NTL/GF2XFactoring.h>
|
|
#include "dpa.h"
|
|
|
|
using namespace NTL;
|
|
|
|
void
|
|
S_E2 (unsigned char *key, unsigned char *block, unsigned int s)
|
|
{
|
|
int i;
|
|
for (i = 0; i < DPA_BLOCK_SIZE; ++i)
|
|
{
|
|
block[i] ^= (key[i] ^ s) % 256;
|
|
}
|
|
return;
|
|
}
|
|
|
|
void
|
|
E2 (unsigned char *key, unsigned char *block, unsigned int shift)
|
|
{
|
|
rbytechain (block);
|
|
rbitshift (block, shift);
|
|
S_E2 (key, block, shift);
|
|
}
|
|
|
|
void
|
|
DPA_ecb_encrypt (DPA_KEY * key, unsigned char * src, unsigned char * dst)
|
|
{
|
|
int j;
|
|
memcpy (dst, src, DPA_BLOCK_SIZE);
|
|
for (j = 0; j < 16; j++)
|
|
E2 (key->subkey[j].key, dst, key->subkey[j].shift);
|
|
return;
|
|
}
|
|
|
|
void affichage(unsigned char *chaine)
|
|
{
|
|
int i;
|
|
for(i=0; i<16; i++)
|
|
printf("%.2x",(unsigned char )chaine[i]);
|
|
printf("\n");
|
|
}
|
|
|
|
unsigned char test_p[] = "ABCD_ABCD_12____";
|
|
unsigned char test_c1[16];
|
|
unsigned char test_c2[16];
|
|
DPA_KEY key;
|
|
RC4_KEY rc4_key;
|
|
|
|
struct vect {
|
|
unsigned char plaintxt[16];
|
|
unsigned char ciphertxt[16];
|
|
};
|
|
|
|
struct vect toto[128];
|
|
unsigned char src1[16], src2[16];
|
|
unsigned char block1[16], block2[16];
|
|
|
|
int main()
|
|
{
|
|
|
|
/* Key */
|
|
unsigned char str_key[] = " _323DFF?FF4cxsdé&";
|
|
DPA_set_key (&key, str_key, DPA_KEY_SIZE);
|
|
|
|
/* Init our RANDOM generator */
|
|
char time_key[16];
|
|
snprintf(time_key, 16, "%d%d",(int)time(NULL), (int)time(NULL));
|
|
RC4_set_key(&rc4_key, strlen(time_key), (unsigned char *)time_key);
|
|
|
|
/* Let's crypt 16 plaintexts */
|
|
printf("[+] Generating the plaintexts / ciphertexts\n");
|
|
|
|
int i=0;
|
|
int a=0;
|
|
for(; i<128; i++)
|
|
{
|
|
RC4(&rc4_key, 16, src1, src1); // Input is nearly random :)
|
|
DPA_ecb_encrypt (&key, src1, block1);
|
|
RC4(&rc4_key, 16, src2, src2); // Input is nearly random :)
|
|
DPA_ecb_encrypt (&key, src2, block2);
|
|
|
|
for(a=0;a<16; a++)
|
|
{
|
|
toto[i].plaintxt[a] = src1[a] ^ src2[a];
|
|
toto[i].ciphertxt[a] = block1[a] ^ block2[a];
|
|
}
|
|
}
|
|
|
|
/* Now the NTL stuff */
|
|
|
|
printf("[+] NTL stuff !\n");
|
|
vec_GF2 m2(INIT_SIZE,128);
|
|
vec_GF2 B(INIT_SIZE,128);
|
|
mat_GF2 M(INIT_SIZE,128,128);
|
|
mat_GF2 Mf(INIT_SIZE,128,128); // The final matrix !
|
|
clear(Mf);
|
|
clear(M);
|
|
clear(m2);
|
|
clear(B);
|
|
|
|
/* Lets fill M correctly */
|
|
|
|
int k=0;
|
|
int j=0;
|
|
for(k=0; k<128; k++) // each row !
|
|
{
|
|
for(i=0; i<16; i++)
|
|
{
|
|
for(j=0; j<8; j++)
|
|
M.put(i*8+j,k,(toto[k].plaintxt[i] >> j)&0x1);
|
|
}
|
|
}
|
|
|
|
GF2 d;
|
|
determinant(d,M);
|
|
|
|
/* if !det then it means the vector were linearly linked :'( */
|
|
|
|
if(IsZero(d))
|
|
{
|
|
std::cout << "det(M) = 0\n" ;
|
|
exit(1);
|
|
}
|
|
|
|
/* Let's solve the 128 system :) */
|
|
|
|
printf("[+] Calculation of Mf\n");
|
|
for(k=0; k<16; k++)
|
|
{
|
|
for(j=0; j<8; j++)
|
|
{
|
|
for(i=0; i<128; i++)
|
|
{
|
|
B.put(i,(toto[i].ciphertxt[k] >> j)&0x1);
|
|
}
|
|
solve(d, m2, M, B);
|
|
|
|
#ifdef __debug__
|
|
std::cout << "m2 is " << m2 << "\n";
|
|
#endif
|
|
|
|
int b=0;
|
|
for(;b<128;b++)
|
|
Mf.put(k*8+j,b,m2.get(b));
|
|
}
|
|
}
|
|
|
|
#ifdef __debug__
|
|
std::cout << "Mf = " << Mf << "\n";
|
|
#endif
|
|
|
|
/* Now that we have Mf, let's make a test ;) */
|
|
|
|
printf("[+] Let's make a test !\n");
|
|
bzero(test_c1, 16);
|
|
bzero(test_c2, 16);
|
|
char DELTA_X[16];
|
|
char DELTA_Y[16];
|
|
bzero(DELTA_X, 16);
|
|
bzero(DELTA_Y, 16);
|
|
DPA_ecb_encrypt (&key, test_p, test_c1);
|
|
|
|
// DELTA_X !
|
|
unsigned char U0[] = "ABCDEFGHABCDEFG1";
|
|
unsigned char Y0[16];
|
|
DPA_ecb_encrypt (&key, U0, Y0);
|
|
|
|
for(i=0; i<16; i++)
|
|
{
|
|
DELTA_X[i] = test_p[i] ^ U0[i];
|
|
}
|
|
|
|
// DELTA_Y !
|
|
vec_GF2 X(INIT_SIZE,128);
|
|
vec_GF2 Y(INIT_SIZE,128);
|
|
clear(X);
|
|
clear(Y);
|
|
for(k=0; k<16; k++)
|
|
{
|
|
for(j=0; j<8; j++)
|
|
{
|
|
X.put(k*8+j,(DELTA_X[k] >> j)&0x1);
|
|
}
|
|
}
|
|
|
|
Y = Mf * X;
|
|
|
|
#ifdef __debug__
|
|
std::cout << "X = " << X << "\n";
|
|
std::cout << "Y = " << Y << "\n";
|
|
#endif
|
|
|
|
GF2 z;
|
|
for(k=0; k<16; k++)
|
|
{
|
|
for(j=0; j<8; j++)
|
|
{
|
|
z = Y.get(k*8+j);
|
|
if(IsOne(z))
|
|
DELTA_Y[k] |= (1 << j);
|
|
}
|
|
}
|
|
|
|
// test_c2 !
|
|
|
|
for(i=0; i<16; i++)
|
|
test_c2[i] = DELTA_Y[i] ^ Y0[i];
|
|
|
|
/* Compare the two vectors */
|
|
|
|
if(!memcmp(test_c1,test_c2,16))
|
|
printf("\t=> Well done boy :>\n");
|
|
else
|
|
printf("\t=> Hell !@#\n");
|
|
|
|
#ifdef __debug__
|
|
affichage(test_c1);
|
|
affichage(test_c2);
|
|
#endif
|
|
return 0;
|
|
}
|
|
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
|
|
|
|
|
|
--[ Annexe B - Probability evaluation of (hash32()%8 == 0)
|
|
|
|
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
|
|
#include <stdio.h>
|
|
#include <stdlib.h>
|
|
#include <string.h>
|
|
#include <time.h>
|
|
|
|
#define NBR_TESTS 0xFFFFF
|
|
|
|
int main()
|
|
{
|
|
int i = 0, j = 0;
|
|
char buffer[16];
|
|
int cmpt = 0;
|
|
int rand = (time_t)time(NULL);
|
|
float proba = 0;
|
|
srandom(rand);
|
|
for(;i<NBR_TESTS;i++)
|
|
{
|
|
for(j=0; j<4; j++)
|
|
{
|
|
rand = random();
|
|
memcpy(buffer+4*j,&rand,4);
|
|
}
|
|
checksum128 (buffer, buffer, 16);
|
|
if(!(hash32(buffer,16)%8))
|
|
cmpt++;
|
|
}
|
|
proba = (float)cmpt/(float)NBR_TESTS;
|
|
printf("[+] Probability is around %f\n",proba);
|
|
return 0;
|
|
}
|
|
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
|
|
|
|
|
|
--[ Annexe C - 2nd preimage attack on 32 bytes messages
|
|
|
|
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
|
|
#include <stdio.h>
|
|
#include <stdlib.h>
|
|
#include <string.h>
|
|
#include <sys/types.h>
|
|
#include <sys/stat.h>
|
|
#include <fcntl.h>
|
|
#include "dpa.h"
|
|
|
|
void
|
|
E2 (unsigned char *key, unsigned char *block, unsigned int shift)
|
|
{
|
|
rbytechain (block);
|
|
rbitshift (block, shift);
|
|
S_E (key, block, shift);
|
|
}
|
|
|
|
void
|
|
DPA_ecb_encrypt (DPA_KEY * key, unsigned char * src, unsigned char * dst)
|
|
{
|
|
int j;
|
|
memcpy (dst, src, DPA_BLOCK_SIZE);
|
|
for (j = 0; j < 16; j++)
|
|
E2 (key->subkey[j].key, dst, key->subkey[j].shift);
|
|
return;
|
|
}
|
|
|
|
void affichage(unsigned char *chaine)
|
|
{
|
|
int i;
|
|
for(i=0; i<16; i++)
|
|
printf("%.2x",(unsigned char )chaine[i]);
|
|
printf("\n");
|
|
}
|
|
|
|
int main(int argc, char **argv)
|
|
{
|
|
DPA_KEY key;
|
|
unsigned char str_key[] = "deadbeef";
|
|
unsigned char IV[] = "0123456789abcdef";
|
|
unsigned char evil_payload[] = "COMPLETELYBROKEN";
|
|
unsigned char D0[16],D1[16];
|
|
unsigned char final_message[32];
|
|
int fd_r = 0;
|
|
int i = 0;
|
|
|
|
if(argc < 2)
|
|
{
|
|
printf("Usage : %s <file>\n",argv[0]);
|
|
exit(EXIT_FAILURE);
|
|
}
|
|
|
|
DPA_set_key (&key, str_key,8);
|
|
if((fd_r = open(argv[1], O_RDONLY)) < 0)
|
|
{
|
|
printf("[+] Fuck !@#\n");
|
|
exit(EXIT_FAILURE);
|
|
}
|
|
|
|
if(read(fd_r, D0, 16) != DPA_BLOCK_SIZE)
|
|
{
|
|
printf("Too short !@#\n");
|
|
exit(EXIT_FAILURE);
|
|
}
|
|
|
|
if(read(fd_r, D1, 16) != DPA_BLOCK_SIZE)
|
|
{
|
|
printf("Too short 2 !@#\n");
|
|
exit(EXIT_FAILURE);
|
|
}
|
|
close(fd_r);
|
|
memcpy(final_message, evil_payload, DPA_BLOCK_SIZE);
|
|
blockchain(evil_payload, IV);
|
|
DPA_ecb_encrypt (&key, evil_payload, evil_payload);
|
|
blockchain(D0,IV);
|
|
DPA_ecb_encrypt (&key, D0, D0);
|
|
blockchain(D0,D1);
|
|
blockchain(evil_payload, D0);
|
|
memcpy(final_message+DPA_BLOCK_SIZE, evil_payload, DPA_BLOCK_SIZE);
|
|
|
|
for(i=0; i<DPA_BLOCK_SIZE*2; i++)
|
|
printf("%c",final_message[i]);
|
|
return 0;
|
|
}
|
|
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
|
|
|