Tuesday, December 19, 2006

CPS2 notes, part 5

In part 1 I showed some constant values in the table that shows how many times flipping a bit in the ciphertext flips a bit in the plaintext.
But there's more than that. It seems that there are more properties which are true for all games, regardless of the key.

      0    1    2    3    4    5    6    7
0 7FE0 75B0 8028 7AF4 6EB0 8284 734C 7420
1 8208 7FE0 7E48 7FA4 81CC 8050 72F8 7FDC
2 7F38 7F8C 8000 7FD0 87B0 7EF4 7BF8 8068
3 7200 6CA0 7B00 76E0 6800 8044 B280 8180
4 7F18 7FCC 7F84 8130 7BC4 80E4 8154 8014
5 8980 7B80 7780 81D4 6200 7D78 7540 7DE0
6 83C0 7E98 7F3C 831C 6CF0 8080 7BD8 7D20
7 80B0 81B0 7EAC 8190 7DB4 8050 8364 7F68
8 7D80 80E0 7D80 8110 78C0 7FC4 6F40 8DA0
9 5080 7510 C000 643C 6DC0 8414 4800 72C0
10 7BE0 7A30 7F40 7F08 6A00 8110 7720 7B80
11 8040 81A0 7E80 7E2C 5680 7FFC 73C0 8020
12 7A80 8370 7100 818C 7E80 7BE8 6200 8060
13 8018 7D24 8040 7F60 7E2C 80EC 7F9C 7EA0
14 7D18 7F00 7EEC 80DC 799C 7F90 7F34 8010
15 7E80 8710 8540 8128 8200 7F50 7C40 8540

8 9 10 11 12 13 14 15
0 8054 7C58 7BF4 7E88 7C80 7610 7C7C 7C28
1 8008 7FA4 80F0 8020 7FB8 7E24 7EE8 8078
2 7FB8 7F18 7FE4 7F14 7FDC 80C0 7C3C 8130
3 80DC 7F34 7EE0 8124 77F0 7800 5A80 7064
4 800C 7F98 7D3C 7E60 803C 7C74 80E0 80A0
5 8024 7BA8 79C8 8020 7A24 7E00 7F60 7AAC
6 7F80 7E94 7B08 7FE8 7E98 6DA0 7FD4 7E50
7 7FE8 7E20 7D6C 7F88 8004 7BD4 803C 7FCC
8 7F4C 7FE4 7E48 7F10 7F40 7FC0 74C0 82B4
9 812C 8064 6520 78D8 7D5C 5040 5800 8588
10 7D50 7FB8 883C 7E00 7EDC 8A00 78E0 7FDC
11 7EF8 7F64 7860 7E30 804C 6780 82C0 81E8
12 8094 7CCC 7A44 7EF0 7FC0 7C80 8100 7EF4
13 8180 7FA0 7FF4 8084 8000 8088 7F9C 8024
14 8004 8080 809C 7F44 806C 8044 7A5C 7FEC
15 7EC0 814C 7A10 7F2C 7E80 9200 7E20 8064

    are constant.

    are correlated and can be only one of 2 quadruplets.

                are all correlated and values can be taken in one of 4 "groups".

Once chosen the group:
    are correlated and can be only one of 2 pairs.
    are correlated and can be only one of 2 pairs.
    are correlated and can be only one of 4 pairs.
    are fixed by     and    .

So in total there are 2x4x2x2x4 = 128 combinations. Each combination is used by exactly 256 tables.

Note the symmetry of it all. 16 values are affected, related to 2 bits of the ciphertext and 8 bits of the plaintext.

This suggests that the algorithm might work on the data 8 bits at a time, or even only 4 bits at a time. The regular behaviour could be caused by a step, at the beginning or at the end of the algorithm, that doesn't use the key.

Another thing we can do is check which of the 128 combinations corresponds to each memory address, and look for a way to correlate the two. Unfortunately there doesn't seem to be a simple way to do that.

CPS2 notes, part 4

I have to stress that there is NO progress being made in breaking the encryption. The things I am showing in these notes have been known for over a year and haven't lead to any breakthrough. Hopefully, a public discussion of these properties could generate some valuable feedback.

There also isn't any kind of competition against the people that are attacking the CPS2 encryption in hardware. On the contrary, I think that the hardware path will be the only way to gather more information about the algorithm.

In these notes I have shown several properties that are always true, regardless of the key. This means that they are properties of the algorithm itself, and therefore are hardcoded in hardware. For example, there almost surely are fixed substitution boxes which could be extracted from the custom CPU.

If an algorithm is reconstructed by studying the CPU, we can also test it against the known properties, regardless of the key. If it doesn't match, then the algorithm isn't right. Once an algorithm fitting the properties is found, we can start looking for the key.

Monday, December 18, 2006

CPS2 notes, part 3

In part 1 we were flipping bits in the ciphertext, and seeing what happened to bits in the plaintext.

We can do the opposite, of course, and there's something unexpected in the results.
I'll show the total number of times the bits change (in hex) instead of percentages, to make the point more visible.
      0    1    2    3    4    5    6    7
0 80BC 7844 7E24 7FD4 7EBC 8090 8138 8044
1 7FB0 8138 7A90 7EB0 7D54 7F04 7FE4 8074
2 7F20 7F30 81A4 8018 80C4 7FB4 7F08 7EF8
3 7F30 8180 7D10 7EA0 6F40 7E30 85E0 79E0
4 7EA8 7B1C 8240 7FD4 7DE4 7F24 7F44 80C4
5 A2C0 6D00 6CC0 69E0 6700 7F28 5E00 4E00
6 7F5C 7ECC 7FB8 7FF8 7C38 7FF0 7EF0 7CAC
7 7FD0 7D80 8168 8058 8104 7FA0 7FAC 80E8
8 71E0 B500 8E80 8020 7900 8124 7980 8240
9 7A20 6E40 7AE0 7FA8 8080 7F84 8240 88E0
10 7930 6900 7A50 81BC 7220 7C48 7A20 6FC0
11 7950 8C00 7600 8054 6F20 806C 7CE0 89C0
12 6F40 4B00 85C0 8010 7600 809C 7E00 7E80
13 7F30 7B20 7FB8 8028 803C 80B4 7E80 8140
14 7A90 8034 7D4C 8124 80D8 7E8C 7FC8 7FC0
15 73F0 79C0 7990 80D0 9440 7D94 7AC0 6EA0

8 9 10 11 12 13 14 15
0 8048 7E84 7DB8 801C 80B8 80A4 62BC 7FB0
1 803C 7FE8 7E20 81E8 80F0 7D68 8144 7FBC
2 807C 80C4 7F80 7FFC 7F90 7EE0 82B0 80B4
3 7EB0 80E0 7E68 7F70 7E78 8010 7580 7BBC
4 7FF0 7F68 7FB8 7FFC 7FAC 8074 7DD4 813C
5 7F08 6FA0 7254 6BB8 6C10 6380 6300 7AE8
6 7FF4 7F9C 8018 7FCC 8138 8118 6E14 80BC
7 8088 80DC 7FC0 7DB8 7F8C 7EE4 84E0 7F48
8 7EEC 8090 84EC 7E28 8328 8960 2D00 85AC
9 7F94 7EA4 7D6C 8080 8068 8200 5FC0 82F0
10 7AF0 80AC 80C8 7CB4 8054 8640 7100 7F8C
11 81F0 8080 7F2C 8064 7F94 82C0 6E80 7B30
12 809C 7FD0 7F98 7ED0 75C8 83C0 D300 7CE8
13 8134 8160 80A0 7F40 7D6C 809C 80E4 7F58
14 7F68 7F14 7F44 7F28 8010 7EB4 7F10 7F80
15 82E0 7D4C 78D8 804C 8000 78A0 6240 75DC

The values highlighted in orange are the only two that are constant in every table (there were four in the inverse table).

But the values highlighted in red are the most interesting. They are not constant--they can vary among a few possible values. But the sum of the two values in each column is constant, and it's always 0x10000.

Why? I have no idea.

CPS2 notes, part 2

The complementation property was an important discovery--and not just because it reduces the size of the tables in half. Still, we haven't taken full advantage of it yet.

Let's recap the basics of the CPS2 encryption first.

Inputs:
  • 16-bit value stored in ROM

  • 16-bit address (bits 1-17 of the physical address)

  • key of unknown size, different for every game

Outputs:
  • 16-bit decrypted value


Let's call D(X,A,K) the decryption of value X at address A using key K.
The complementation property says that for every A there is exactly one A1 such that

D(X,A,K) = D'(X',A1,K)

where x' is the complement of x.

This finding is important because it shows that A has an algorithmical effect on the encryption. In Sega's FD1094 CPU, the key for every address is just stored in a huge table. If the CPS2 CPU worked in the same way, the complementation property wouldn't happen.
This isn't too much of a surprise: with the Kabuki CPU, we had already seen that Capcom preferred a complex algorithm with a small key, while Sega preferred a simpler algorithm with a huge key.

Unfortunately we don't yet know how to calculate A1 given A. It varies from game to game so it must be a function of the key.

The complementation property isn't unheard of, even in strong ciphers, so it isn't necessarily a weakness in the algorithm. For example, DES has it. In that case, it reads D(X,K) = D'(X',K').

In general, the complementation property indicates that there are probably XOR operations happening, which cause the complement operation to cancel out. Let's see this with an example: consider a substitution function f, and an algorithm such that

d = e XOR f(e XOR k)

if we take the complements we have

e' XOR f(e' XOR k') = e' XOR f(e XOR k) = (e XOR f(e XOR k))' = d'

of course this is a very simple example. Note that x' doesn't have to be the complement in this case: you can define it as e.g. x' = x XOR 1, and it will still work. So the CPS2 algorithm obviously isn't that simple.

A more realistic example would be a Feistel network (note that DES is an example of a Feistel network).

If you define the Feistel network as
Li = Ri-1
Ri = Li-1 XOR f(Ri-1 XOR Ki-1)

it should be easy to see how the complementation property would ensue.

The idea of the CPS2 encryption being a Feistel network is tempting, however I don't think this is the case, because I would expect the diffusion to be much better than what we have seen in part 1.

Sunday, December 17, 2006

CPS2 notes, part 1

Since the finding of the complementation property almost one year ago, there has been no progress at all on the CPS2 encryption.

I'm going to explain here some of the (remarkably few) things we know about the encryption.

A common misconception is that the decryption tables look like "random data". They may look so to the naked eye, but the most basic statistical checks show that this isn't the case.

Take an encrypted value, change a bit in it, and look at what happens to the output. For a random table, you'd expect every bit in the decrypted value to change with 50% probability. This isn't happening.
Take a look at the following statistics taken on a single table: on rows, you have the encrypted bit that changes; on columns, the frequency with which the decrypted bit changes.

0 1 2 3 4 5 6 7
0 50,26% 44,09% 50,98% 49,68% 42,94% 50,81% 44,99% 43,87%
1 49,52% 49,73% 49,87% 49,90% 49,32% 49,77% 45,37% 50,43%
2 49,62% 50,05% 49,67% 50,25% 51,56% 49,86% 49,07% 50,07%
3 44,53% 45,75% 48,05% 48,93% 40,63% 50,22% 69,73% 44,29%
4 49,83% 50,19% 49,62% 49,98% 48,35% 49,88% 50,38% 50,57%
5 50,24% 46,53% 50,78% 50,21% 28,13% 49,79% 44,63% 47,80%
6 51,92% 49,73% 50,10% 50,17% 38,28% 50,10% 48,35% 47,86%
7 50,29% 49,51% 49,16% 49,86% 48,35% 49,93% 50,85% 50,07%
8 51,95% 52,51% 51,37% 48,71% 49,22% 50,07% 43,55% 46,78%
9 31,45% 48,32% 75,00% 43,55% 46,39% 50,51% 28,13% 46,92%
10 50,20% 50,95% 51,37% 49,78% 28,13% 50,06% 47,17% 51,66%
11 45,17% 45,85% 47,66% 50,38% 25,20% 50,58% 44,43% 53,37%
12 47,85% 47,51% 44,14% 50,04% 53,03% 49,24% 38,28% 49,17%
13 49,68% 49,90% 49,43% 50,10% 50,02% 49,46% 50,05% 49,89%
14 49,35% 50,32% 49,98% 49,80% 48,29% 49,65% 49,48% 50,42%
15 48,97% 51,59% 51,07% 49,57% 67,19% 49,55% 45,70% 50,39%

8 9 10 11 12 13 14 15
0 50,43% 48,46% 49,02% 49,49% 49,21% 43,35% 49,05% 47,69%
1 49,85% 49,69% 50,57% 49,76% 50,01% 49,40% 49,88% 50,15%
2 50,14% 50,05% 49,86% 49,57% 50,25% 50,96% 48,99% 50,18%
3 49,46% 50,18% 48,31% 50,30% 46,96% 40,63% 35,35% 44,34%
4 49,82% 50,43% 49,70% 50,34% 49,79% 47,74% 50,16% 49,57%
5 49,91% 48,74% 49,68% 49,88% 48,79% 28,91% 49,02% 48,36%
6 49,74% 49,21% 50,24% 49,62% 49,40% 39,32% 49,91% 49,62%
7 50,14% 50,03% 49,63% 50,39% 49,65% 47,67% 50,43% 49,37%
8 49,83% 49,96% 48,57% 50,43% 49,15% 40,63% 46,97% 50,50%
9 50,15% 49,08% 44,80% 48,27% 49,65% 50,78% 34,38% 50,81%
10 49,76% 49,92% 50,28% 49,24% 49,21% 67,97% 49,27% 49,87%
11 47,55% 50,00% 50,16% 47,17% 49,61% 25,39% 48,93% 51,68%
12 49,83% 48,97% 49,69% 49,86% 49,30% 52,93% 50,39% 49,90%
13 49,85% 50,37% 50,52% 49,94% 49,60% 51,22% 49,96% 49,92%
14 49,85% 50,18% 49,97% 49,89% 50,09% 49,84% 47,50% 49,64%
15 49,49% 49,37% 49,39% 50,09% 49,57% 38,28% 49,17% 49,57%

So there is a large number of values around 50%, which look just random, but there are also values very far from that.

This indicates that the encryption algorithm doesn't have good diffusion. This is a weakness, though it hasn't been exploited yet.

Of particular interest are the four values I highlighted in red. While the other values change from game to game and from table to table, those four values are always the same. E.g. flipping bit 9 in the encrypted value causes bit 0 in the decrypted value to flip exactly 0x5080 times out of 0x10000, for every game, at every address.

This property is quite interesting. It is the most obvious "signature" of the algorithm. Does it help? Well, it tells us that if the algorithm contains bit permutations that depend on the key, those permutations cannot affect bits 3 and 9 in the encrypted value, nor bits 0 and 14 of the decrypted data. Apart from that, however, the property doesn't tell us much, because even if we know that the bits have to change that many times, we don't know exactly when to flip them. Discovering that would be a significant advance in the understanding of the algorithm.

Friday, December 08, 2006

Katamino

(This article is not MAME related)

Some time ago I was shopping looking for a set of Pentomino pieces and found this game called Katamino®.

It looked like a decent enough set of wooden pieces, so I bought it. Pentominoes are a decades old puzzle, so I didn't expect anything more than that. Instead, when I opened the box, I was blown away by how cleverly the puzzle is presented.
The idea is credited to André Perriolat, who is the owner of the company that produces the game, DJ Games.

Here is how it works: take four pentominoes, and make a 5x4 rectangle with them. After you've accomplished that, add another pentomino, and make a 5x5 square. Then add another, and make a 5x6 rectangle. And so on, until you add the final piece to make a 5x12 rectangle. The box contains a board with a moving slider to place your pieces in.

There is only one rule: when you add the I piece, you cannot place it vertically. Otherwise it would be cheating, since you could take the solution to the previous puzzle and just stick the I to the side.
For the same reason, the I piece is not used for the 5x5 puzzles, since even if you put it horizontally it would still be a 5x4 rectangle with the I stuck to the side.

In the box there's a list of 12 such sequences that you can follow, each one ending with a different piece.

This is absolutely brilliant. You start with easy puzzles wich practically anyone can solve, and then gradually increase the difficulty until you complete the full puzzle--or run out of patience, whichever comes first.

Every time you complete one of the puzzles there's a sense of achivement--and then immediately of frustration, because to add one more piece you have to throw away the perfect shape you just created, and start again from scratch.

I liked this idea so much that I started thinking about it. How many ways are there to build sequences to play the game? Are the ones provided the best possible, or can they be improved? Let's see what the goal is:

  • build 12 sequences, each one ending with a different piece

  • every combination of pieces must be used only once


Things get interesting from the start. There are only 19 ways to select 4 pieces that can form a 5x4 rectangle; but 6 of them cannot be extended to 5x5 without using the I piece. This leaves us with 13 initial choices, just one more than what we need.

How can we rate our sequences to decide which ones are better than others? The obvious answer is to look at the number of different solutions for each combination of pieces. The less the solutions, the better the puzzle is. If two sequences have the same number of solutions, then the one that contains more puzzles which have a single solution is better.
The 5x11 puzzles are obviously the same for every choice, so we don't have to count them. This leaves us with 12 sets of 7 puzzles (from 5x4 to 5x10) for a total of 84 puzzles, which we want to be as hard as possible.

Turns out that the sequences accompanying my copy of the game aren't very good: there's a total of 3084 solutions, and only 20 puzzles with a single solution. Even worse, one of the puzzles is repeated twice.
It seems that I have and old version of the game (the instructions say (c) 1992-1998). The version currently being produced uses a better list, whith 1556 solutions and 26 single solutions (and no duplicates).

Writing a program to find the best solution is a remarkably difficult task. I'm not even sure if it has been classified. If it is, I'd expect it to be as part of problems related to graph theory.

With a mix of manual pruning and computer search, I've selected this sequence which looks quite good:


L P U F [1] W [1] I [4] Z [1] T [5] N [1] V [83] Y [1277] X [786]
L Y U F [3] Z [1] I [2] N [2] T [4] X [2] V [16] P [ 508] W "
Y N V T [1] F [1] I [3] W [1] U [5] X [1] L [ 8] P [ 371] Z "
L Y P W [5] N [1] V [1] I [5] X [1] Z [5] F [25] U [ 216] T "
Y P U F [1] T [1] I [2] Z [1] N [2] X [1] W [ 2] L [ 183] V "
L Y P T [1] W [1] F [1] N [5] X [3] Z [1] I [ 8] V [ 133] U "
L V P Z [2] Y [1] T [2] I [5] X [1] W [1] F [15] U [ 125] N "
L Y P Z [1] W [2] T [1] V [1] X [1] U [3] N [21] I [ 116] F "
L N P U [2] Z [2] X [1] V [1] F [1] T [2] W [14] Y [ 112] I "
L N V Z [2] U [1] T [1] F [2] I [6] W [3] X [ 1] P [ 101] Y "
Y N P U [1] V [1] Z [1] X [1] F [2] T [1] W [ 1] I [ 59] L "
L Y V U [2] Z [1] F [1] W [1] X [1] N [1] T [ 1] I [ 14] P "


The numbers in square brackets are the solutions for each puzzle.
The total is 331 solutions, and 45 puzzles with a single solution.

I think this should be quite close to the minimum, but I don't have a way to prove it.

Can anyone do better?