Nicola's MAME Ramblings: 01/01/2007

Sunday, January 21, 2007

CPS2 Key Bit Order

As previously mentioned, the 64-bit keys I'm currently using should be the same as the hardware ones, except for a fixed permutation of the bits.

The permutation is actually irrelevant as far as the algorithm is concerned, since it is already taken into account when generating subkeys. The difference that it does make, however, is that there are strong suspicions that some of the keys are not random numbers, so what looks like random gibberish currently would show some order if we had the correct permutation.

Take the ssf2 versions for example. There are currently 6 different versions supported: World, USA, Asia, Japan, Tournament World, Tournament Japan. Here are the keys (in a different order):

3D9E1E15A58C32CE
3599DF35AD98284C
B74433502F4653D7
8758E3923FFA1A50
F0AE3D08420DD6BF
6260014FD857F7A7

there is something immediately obvious about those keys: they all contain exactly 32 0s and 32 1s.
When picking one random 64-bit numbers, the likelihood of this happening is about 1 in 10, so it's ok. But the likelihood of it happening for SIX numbers is about 1 in 1 million! So we can be pretty sure that those keys are not random numbers.

What is one particularly simple sequence that has exactly 32 1s? Well, of course 0123456789ABCDEF. And sure enough, after looking at the bits for a while and applying an appropriate permutation, here is what the above keys become:

0123456789ABCDEF
1032547698BADCFE
45673210CDEFAB89
67451032FEDC98BA
89ABDCEF45672301
CDEFBA9823016754

looks much better doesn't it?
Though there's no way to tell how close it is to the real thing.

Friday, January 19, 2007

CPS2 Moving Slowly Now

The improved attack works, and I've been able to recover a few more keys, but it takes a lot of computation time--several hours per game.

I've also experienced some failures, e.g. I could find the key for dimahoo but not for gmahou. This might have been because of a false positive when looking for pairs with the complementation property, so now I'm trying again with a different pair. Since the searches take so long, experimenting isn't easy.

On the plus side, I've applied the improved attack also to some games for which XOR files were not available, and it worked in a number of cases--though it failed in many others.

One of the new versions supported is mshb, which I is the first Brazilian version of a CPS2 game to work.

Andreas has published details on a theoretical attack using differential cryptanalysis which looks promising. If Andy's calculations are correct, it should allow to retrive the key for the three most problematic games: spf2t, spf2xj, and spf2ta. It does require a lot of (E,D,k) triplets, something in the order of 2¹⁶-2¹⁷, which is a bit steep, but we should be able to do that in a few cases.

One thing to note about my attack, which I might not have explained clearly, is that it requires a remarkably small amount of data: it has worked for many games with just 7 (E,D) pairs. The problem is that those pairs must all be at the same address (as far as the encryption is concerned; that is, bits 1-16 of the address must be the same). Those 7 pairs allow to retrieve the 96-bit key for the second round at that address, and once that is known, the 64-bit master key can be found in less than 1 second, without having to know ANY other (E,D) pair at any address.

Unfortunately for game like spf2t, whose full encryption range covers just 2 repetitions of an address, we are never going to have 7 pairs so the attack will never work.

Wednesday, January 17, 2007

CPS2 The Fight Continues

I've finished going through all the games previously supported by MAME using XOR files, and generating keys using this attack.

The attack needs a minimum of 7 (E,D) pairs at some address in order to work, though with just 7 pairs it takes several hours to find the key.

Most of the games provided at least 8 pairs, a few 7, so the attack worked.

On 11 games the attack didn't work. Three of them only provide 2 pairs, so there's no way for the attack to work--a different approach will be needed.

The others provide 4 pairs, and I'm now trying to still perform the attack, using a new trick.

Remember the complementation property? For every address A, we know that exists another address A₁ such that D(X, A) = D(X ^ 0xffff, A₁) ^ 0xffff. The problem is that we don't know A₁. We can search it, however, using the XOR data. Pick an address, look at the four (E,D) pairs associated to it, and then see if at another address there is a pair (E ^ 0xffff, D ^ 0xffff). That way we can put together the information from the two addresses, raising the number of pairs from 4 to 7, barely enough to run the attack.

There's a possibility of false positives when doing this, so avoid all pairs where E or D are 0x0000 or 0xffff, because those values are very common and make the probablity of a false positive much higher.

In theory this trick should work, though it will require some luck and a lot of time. The holy grail remains an attack which could use pairs from different addresses; that would be the only way to retrieve the key for the games that lack XORs.

Sunday, January 14, 2007

CPS2 Getting Closer

The correlations between the 96-bit keys of the two Feistel networks were crucial in getting the s-boxes with 4 or 5 inputs "in sync"--that is, make them idential to the real ones apart from a fixed XOR or permutation applied to the whole box.

Eventually, I ended with a layout which I'm 99.9% sure is equivalent to the real one. We cannot know the exact contents of the real s-boxes without getting them from the actual hardware, but the current ones should be matematically equivalent.
The result is here: http://xoomer.alice.it/nicola.salmoria/cps2crptv2.zip.

The most notable news is that the key is now reduced to 64 bits, and the one we are currently using should be identical to the one used by the hardware, apart from a fixed permutation of the bits.
Finding the real permutation would be nice, but obviously that's not something we can determine from the algorithm, since the order of the bits of the key is completely irrelevant.

What is interesting to note is that the keys used by some games don't seem to be random. If they were random one would expect there to be around 32 0s and 32 1s, but sometimes this isn't the case. E.g.

pzloop2:  3332206a0077f829
mshj:     01c0c951370f4c80
dstlka:   04048b4e2a498879
ringdest: 0405541367806575
cybotsj:  0404821534388354

Of these, the last three literally scream "I'm not a random number!". Guessing the right bit order to make something appear, of course, is another matter.
Some of the watchdog values contain birth dates, e.g. cmpi.l #$19660419,D1, so I expect the same thing might be happening here.
Also, it makes sense for the pzloop2 one to be more regular than the others because it's third party game.

On the key extraction front, things are going reasonably well. The brute force attack described in the previous article is working decently on most games, however for some of them the available data isn't enough. I'll have a more precise list once I've finished going through all the games. After that, we'll need to devise a better attack if we want to get the missing keys.

The discovery that the key is only 64 bits might help to construct a better attack, though at the moment I don't have many ideas. The fact that the algorithm is divided in two parts, with the output of the first one affecting the key on the second part, complicates things.

Friday, January 12, 2007

How to bruteforce CPS2

I couldn't devise an attack using differential cryptanalysis and the XOR files. The fact that the second 96-bit key changes with the address makes thing more difficult, and I couldn't think of a way to effectively use data from different addresses.

The XOR files give us no more than 8-16 (E,D) pairs at a given address (remember that the encryption only uses the low 16 bits of the address). We have no choice of what data we have, so we have to make the best of it.

So the idea I had is to use a meet in the middle brute force attack which exploits a weakness in the s-boxes.

Remember that

L₁ = R₀
R₁ = L₀ XOR f₁(R₀)

L₂ = R₁
R₂ = L₁ XOR f₂(R₁)

L₃ = R₂
R₃ = L₂ XOR f₃(R₂)

L₄ = R₃
R₄ = L₃ XOR f₄(R₃)

Remember also that the f_n round functions are made of 4 s-boxes each.
These are the 16 s-boxes, with their inputs and outputs:

f1b1  03457- 67
f1b2  12346- 35
f1b3  124567 14
f1b4  023567 02

f2b1  0246-- 46
f2b2  134567 03
f2b3  013457 17
f2b4  123567 25

f3b1  2346-- 35
f3b2  01357- 02
f3b3  012357 16
f3b4  024567 47

f4b1  01367- 03
f4b2  012456 47
f4b3  023457 12
f4b4  234567 56

f2b1 is the weakest link. It has only four inputs, 0246. To get those four inputs, we need just three boxes in round 1: f1b1, which outputs 67, f1b3, which outputs 14, and f1b4, which outputs 02. We don't need f1b2, which outputs 35.

So if we start from the ciphertext and run it through the first two rounds of the Feistel network, scanning all 2²⁴ possible keys for f1b1, f1b3, f1b4, and f2b1, we'll get all possible values for bits 46 of R2.

Now let's start from the plaintext instead, and go up. If we scan all 2⁶ keys for f4b2, we'll get all possible values for bits 47 of R2 again.

Now we impose that bit 4 calculated in these two ways is the same, and given enough (E,D) pairs we have a huge pruning of the valid keys. From experiments, at least 8 pairs are needed for the attack to work effectively, but with 12 it's faster.

When we have a match on bit 4, we start adding more key bits, 6 or 12 at a time. First f4b4, to match bit 6 of R2. Then f1b2 and f2b3, to match bit 7 of R2. At this point, we are most likely already left with a single key, so adding more key bits doesn't increase the computation time. The important thing is to do it 6 bits at a time, in order to avoid unnecessary calculations. Soon enough, we have reconstructed the whole 96-bit key.

This will be the key at a specific address--of course, before running the search we'd have scanned the whole data and selected the address with most different pairs, in order to make the attack more effective.

After the 96-bit key at one address is found, the rest is easy. Since the 96-bit key is modified by the 16-bit result of the first FN in a fixed way, we just use brute force to find the correct 16-bit value at every address, and then we have a full subkey table as when we had the full 8GB tables at the beginning.

The process is working reasonably well, it's taking me 20-40 minutes to find the 192-bit key for a game. It could be made faster, probably. I've done some games for which I had more data, and Razoola was kind enough to provide additional data that will help with many other games. I'm not yet sure that the attack will work on all games, but it surely will on a good percentage.

Some very good news after obtaining more keys is that I found strong correlations between the first and second 96-bit keys, so they are effectively the same key, just permuted. This will also allow to fix the order of the subtables in the s-boxes of the first FN, in the same way I did for the second.

Should we consider ourselves finished after that?

Not yet: whether it will be possible to generate keys for games for which we lack XOR data is an open problem. In that case, the best we can expect to do is to match the startup code from a different version of the game, so we'll have several (E,D) pairs at consecutive addresses, and no more than one pair at a given address. A new attack will have to be devised in order to use that kind of information. Hopefully the discovery that the first and second 96-bit keys are correlated will help.

Tuesday, January 09, 2007

CPS2 coming to MAME

I've submitted to MAMEDEV the code to replace the 4GB CHDs with 192-bit keys.
The code is here: http://xoomer.alice.it/nicola.salmoria/cps2crpt.zip
It contains all the information needed to understand the algorithm, including the s-boxes.

There is still a lot of uncertainty about the contents of the s-boxes. They do their job, but since they are different from the originals they also affect the key. Using the real s-boxes might make correlations between the key bits more apparent, if they exist (anyway, having only 7 keys to look at, there's not enough data to speculate on any kind of correlation).

I'm hoping that it will be possible to extract the s-boxes contents from photos of the decapped CPS2 chip. That would be an important step forward in the accuracy of the emulation.

Now the next challenge is finding the keys for the other CPS2 games that have XOR files but not full tables. I have some ideas, but it doesn't look simple.
We have a lot of (E,D,k) triplets for those games, but unfortunately not many once you fix k (the 96-bit key used in the second FN), and that would be the most important part.

Note that we don't have to find the whole 192-bit key all at once. Once the second 96-bit was found, then we could easily use brute force to get the 16-bit key seed at every address, and therefore easily find the first 96-bit key.

Monday, January 08, 2007

CPS2 Fundamental Breakthrough

Today I realised why I was having problems with the correlations between subkey bits, where some bits had to be derived from the others not only using XOR, but also AND and OR.

The reason was again in the s-boxes that only have 4 or 5 data inputs instead of the full 6. Those boxes contain 2 or 4 subtables, and the order of the subtables inside the box or the order of the bits inside each subtable isn't particularly important when decrypting a single address, because they can always be compensated by appropriate bits in the subkey.

However, when putting all subkeys together, those things became important. E.g. I could be finding that when using subtable #3 in a box, then two of the inputs had to be inverted. To fix this, I had to rearrange the subtables inside the s-boxes to make everything in sync.

The result was excellent. Previously, some of the correlations among the 96 bits were overly complex, taking up to 4 bits at once. Now every bit is directly correlated to another. The only exception is the bits that correspond to the "empty" inputs of s-boxes that take only 4 or 5 data inputs. What this means is that all 6 inputs of all s-boxes always receive the XOR of three sources: either a data bit, a key seed bit, and a global key bit; or two key seed bits, and a global key bit.

That's only the beginning of the good news, though. After fixing the s-boxes, I could attack again the first part on the algorithm on the assumption it was a 4-round Feistel network, and indeed it was.

So the algorithm turns out to be like this:

Take the 16-bit address and a 96-bit key and run them through the first Feistel network, to produce a 16-bit subkey.

Take the 16-bit ciphertext, 16-bit subkey, and another 96-bit key and run them through the second Feistel network, to produce the 16-bit plaintext

So now, for the first time, I would be able to produce a program that algorithmically decrypts one of the game for which we have full tables, without using any table.

It's not yet time to celebrate, though: the s-boxes for the first Feistel network still have to be properly determined, and this might require having access to some more games.

Once that is done, producing keys for games that we have complete tables for will be quite simple.

Finding keys for games for which we don't have full tables, however, is an entirely different matter. As said above, we are potentially dealing with a 192-bit key here; it's possible that the key is smaller and the bits are reused, however since we don't know how they would be, we have to treat it like a 192-bit one. For a key of that size, obviously brute force is out of the question; and while the XOR tables do provide some information, it's probably not the kind of information that would allow to use the differential cryptanalysis techniques we'd need.

Sunday, January 07, 2007

CPS2 Won't Be Tamed That Easily

Some people thought we were almost finished with the CPS2 algorithm, but this doesn't seem to be the case.
True, at least the 4GB tables could now be replaced with a single 128kB table, but that's still not how the hardware works, and it wouldn't be possible to generate proper keys for the games that are missing the full 8GB tables, which is the main concern.

From the full tables, we can extract a 96-bit subkey for every address.
Edit: I had a link to a file here. I've removed it since I've found an error which I'll correct soon.
These wouldn't be the actual keys used by the hardware, however I think me and Andy have agreed that they are, apart from a fixed XOR and bit permutation that wouldn't change from game to game. So we can forget about this step of the decryption for the time being, and move on.

The problem now is to understand how the hardware generates the 96-bit subkey starting from the address and from the global key.

I have determined that only 16 bits are needed to generate the 96-bit subkey; this was sort of expected, but it isn't without problems.
We can generate only 94 bits of the subkey starting from those 16 bits and using only XOR operations. The remaining 2 bits need AND/OR operations, something which I have no explaination for at this point.

That was the least of the problems anyway. The major hurdle at this point is that the mapping from address to 16-bit key seed is very far from trivial.

The scheme should be as follows:

take 16-bit address, N-bit global key, and generate a 16-bit key seed using an unknown algorithm.

take 16-bit key seed, M-bit global key, and generate 96-bit subkey using an unknown algorithm.

take 16-bit ciphertext, 96-bit subkey, and generate 16-bit plaintext using the algorithm we have discovered.

I think it would make sense for the algorithm in step 1. to be another 4-round Feistel network.
If this is the case, things are quite harder than before. To break the other Feistel network, we could rely on complete knowledge of ciphertext-plaintext relationship. Now we can't: we only have a vague idea of what the 16-bit key seed could be.

If we could rely on a 16-bit value except for a constant XOR and permutation, it wouldn't be a problem, since that wouldn't change the nature of the Feistel network. Unfortunately, we don't have that luxury.

Let's see that with an example. Let's say that the first algorithm generates a 4-bit key seed, abcd, which is expanded into the 6-bit subkey ABCDEF this way:

A = a
B = a XOR b
C = a XOR c
D = b XOR c
E = b XOR d
F = b XOR c XOR d

We don't know anything about abcd, all we see is ABCDEF, but we need to guess what abcd looks like. So we notice that

D = B XOR C
F = A XOR C XOR E

and we decide that

a' = A
b' = B
c' = C
d' = E

so we would have

A = a'
B = b'
C = c'
D = b' XOR c'
E = d'
F = a' XOR c' XOR d'

This all works as far as generating the subkey from the seed goes, the problem is that abcd and a'b'c'd' are two completely different numbers! We have that

a'b'c'd' = abcd XOR aab

so this isn't simply a XOR with a constant value, it's a variable modification of the number. And this is something that the Feistel network cannot handle.

Saturday, January 06, 2007

CPS2 Subkeys

I said earlier that I was going to post the program to extract subkeys from a CHD, and the subkey lists for a few games, however after extracting the subkeys I found important relationships between the subkey bits so I'll have to work on that first.

A subkey consists of 24*4 = 96 bits. There are 65536 subkeys, so the total is 786kB of data.

What I found that one bit of the subkey is constant, while 59 can be derived from the others with a XOR (which is constant for all 65536 addresses, but changes from game to game).

So this hints at a 64-bit global key, while the independent subkey bits drop for now at 96-1-59 = 36 bits.

Friday, January 05, 2007

CPS2 S-Boxes ready

Andy has published his program but is reporting some problems in recreating the original tables--the values generated by his program are XORed with a constant value when compared with the original table.

I have just verified that this doesn't happen to me, so I presume this is due to a different method we used to reconstruct the Feistel function for the first two rounds.

Let's recap how this was done. The fundamental idea was published by Andy:

since R₄ = L₃ XOR F₄(L₄), if you pick (E,D) and (E',D') such that L₄ = L'₄, then

R'₄ = L'₃ XOR F₄(L'₄) = L'₃ XOR F₄(L₄) = L'₃ XOR R₄ XOR L₃

therefore

L₃ XOR L'₃ = R₄ XOR R'₄

Now if we pick (E,D) and (E',D') such that L₄ = L'₄ and E and E' differ by only one bit, we can see how flipping a single bit in the input changes the output after the third round. Andy's fundamental discovery was that flipping certain bits never affects certain other bits.

That knowledge can easily be used to determine F₄. Take (E,D) and (E',D'), where E and E' differ only by one bit which we know doesn't affect bit b in L₃. Then we know that

B_b(L₃ XOR L'₃) = 0

where B_b(x) gives me the value of bit b of x.

Therefore

B_b(R₄ XOR F₄(L₄) XOR R'₄ XOR F₄(L'₄)) = 0

that is,

B_b(F₄(L₄) XOR F₄(L'₄)) = B_b(R₄ XOR R'₄)

Do that for all b and for all E, E' pairs, and you quickly reconstruct the whole F₄ structure. I'm saying its structure and not F₄ itself, because it's all done through XOR so you need to set F₄(0) arbitrarily, then all the others are derived from it through XOR.

After F₄ is reconstructed, it can be used to undo the fourth round of the encryption. Then you repeat exactly the same process to reconstruct F₃ (still XOR an arbitrary value). Then use F₃ to undo the third round, and you are left with just two rounds.

Now things get really simple, because by definition

L₂ = L₀ XOR F₁(R₀)
R₂ = R₀ XOR F₂(L₂)

and since you know all of L₀, R₀, L₂, and R₂, F₁ and F₂ are immediately derived as

F₁(R₀) = L₀ XOR L₂
F₂(L₂) = R₀ XOR R₂

Note that here we derive F₁ and F₂ explicitly, not XOR an arbitrary value.

Wednesday, January 03, 2007

CPS2 is not a 4-round Feistel network (...NOT)

Edit: As Andy pointed out, the following reasoning is fatally flawed by the fact that the f_n functions are not bijective.

So, given the strong evidence Andy has found, it really looks like the algorithm is a 4-round Feistel network.

This is actually good news, I have to say I am relieved ;)

Remember that

L₁ = R₀
R₁ = L₀ XOR f₁(R₀)

L₂ = R₁
R₂ = L₁ XOR f₂(R₁)

L₃ = R₂
R₃ = L₂ XOR f₃(R₂)

L₄ = R₃
R₄ = L₃ XOR f₄(R₃)

In the previous article I showed that if we take (E,D) and (E',D') such that L₀ XOR L₄ = L'₀ XOR L'₄ and R₀ = R'₀, then

R₂ = R'₂.

I'll now take a different route from the one I took yesterday, to avoid the flaw that Andy noticed.

From R₂ = R'₂ follows that

L₁ XOR f₂(R₁) = L'₁ XOR f₂(R'₁), that is

R₀ XOR f₂(R₁) = R'₀ XOR f₂(R'₁), that is (since R₀ = R'₀)

f₂(R₁) = f₂(R'₁), that is (since f2 is bijective)

R₁ = R'₁, that is

L₀ XOR f₁(R₀) = L'₀ XOR f₁(R'₀), that is (since R₀ = R'₀)

L₀ = L'₀, therefore

(E,D) = (E',D')

Therefore two different pairs with the requested property cannot exist.

When converting the 16-bit values to and from (L,R) pairs, we need to take the initial and final permutation into account. Note that we are not using R₄ in the above; we are only using L₀, R₀, and L₄. Also, we don't care about the effects of the permutation on L₀ and R₀, the only thing we care about is that when we do L₀ XOR L₄ we XOR the correct bits together. So we can arbitrarily fix the permutation on L₀ and R₀, and try all possible permutations on L₄.

So if the CPS2 algorithm were a Feistel network, for some permutation of the bits in L4 it should not be possible to select two different pairs (E,D) and (E',D') with the requested property.

This doesn't happen: for every permutation, such pairs exist. Therefore CPS2 is not just a 4-round Feistel network.

This is not to say that Andy isn't on the right track, on the contrary, the findings in CPS-2 (5) are extremely interesting, but the algorithm cannot be just a 4-round Feistel network. There's something else we are missing.

Tuesday, January 02, 2007

CPS2 is not a 4-round Feistel network (maybe)

Edit Jan 3: Andy has pointed out a flaw in this article. The order of the bits in the L and R groups is important, so we cannot draw any conclusion yet.

Andreas Naive has been doing some very interesting observations lately.

I thought it would make sense to check if we were lucky and the algorithm was indeed structured like a 4-round Feistel network as Andy supposed.

The basic idea of a Feistel network if that you have a round function f() which operates on some bits of the data (8 in our case), and modifies them depending on a key. The function f is the same on every round, while the key changes.

We don't need to care about a key at this point, since we don't know anything about the algorithm. It's easier (and more general) to simply assume that at every round a different function f_n is used, where f_n describes a permutation of the values 0 to 255.

We have decided that there is strong evidence indicating that the 16 bits of the data should be split in two groups of 8 bits like this:
L = { 0, 1, 2, 4, 6, 7, 13, 14 }
R = { 3, 5, 8, 9, 10, 11, 12, 15 }

So a 4-round Feistel network would work like this:
Given the ciphertext E = (L₀, R₀),

L₁ = R₀
R₁ = L₀ XOR f₁(R₀)

L₂ = R₁
R₂ = L₁ XOR f₂(R₁)

L₃ = R₂
R₃ = L₂ XOR f₃(R₂)

L₄ = R₃
R₄ = L₃ XOR f₄(R₃)

and the resulting plaintext is D = (L₄, R₄)

By the properties of a Feistel network, we can also go backwards, starting from D = (L₄, R₄)

R₃ = L₄
L₃ = R₄ XOR f₄(L₄)

R₂ = L₃
L₂ = R₃ XOR f₃(L₃)

R₁ = L₂
L₁ = R₂ XOR f₂(L₂)

R₀ = L₁
L₀ = R₁ XOR f₁(L₁)

Now let's start from both sides and meet in the middle. Going from E to D we have

L₂ = R₁ = L₀ XOR f₁(R₀)

Going from D to E we have

L₂ = R₃ XOR f₃(L₃) = L₄ XOR f₃(R₂)

Putting the two together we have

L₀ XOR f₁(R₀) = L₄ XOR f₃(R₂), that is

L₀ XOR L₄ = f₁(R₀) XOR f₃(R₂)

Now remember that we know all (E,D) pairs, so we can take advantage of that.
If we take (E,D) and (E',D') such that L₀ XOR L₄ = L'₀ XOR L'₄, then it follows that

f₁(R₀) XOR f₃(R₂) = f₁(R'₀) XOR f₃(R'₂)

If additionally E and E' are such that R₀ = R'₀, then we are left with just

f₃(R₂) = f₃(R'₂)

and since f₃ is a bijective function, this implies that

R₂ = R'₂, that is

L₃ = L'₃, that is

R₄ XOR f₄(L₄) = R'₄ XOR f₄(L'₄), that is

f₄(L₄) XOR f₄(L'₄) = R₄ XOR R'₄

Now we can repeat the procedure for all (E,D) pairs, and get a large number of equations on f₄.

If the CPS2 encryption was a 4-round Feistel network, all those equations would not be contradictory, and they would allow us to reconstruct the structure of f₄.

Unfortunately, they are contradictory. E.g. you get something like this:

f₄(0xff) XOR f₄(0xfc) = 0x11
f₄(0xff) XOR f₄(0xfc) = 0x48

Therefore CPS2 is not a 4-round Feistel network.