## Foreword

So I’ve participated the SunshineCTF 2021 that took place last weekend. I focused on crypto tasks only. There were only three of them, and here I will walk through the solution to the most interesting one — AudioCrypto. The other two were actually pretty boring (trivial Base64 encoding and not as trivial but not much harder common modulus RSA).

Actually I didn’t manage to solve this task during the competition and generally I don’t publish tasks I solve afterwards. But I really think that’s an interesting task and it seems that there are no detailed writeups on it (actually the only one I found is written by task designers: link) so I’ve decided to write one. I think that might be really helpful for CTF beginners.

## Prerequisites

For this task we are given two files: `flag_encrypted.wav`

and `encrypt_wav.py`

.

## Solution

Well the `flag_encrypted.wav`

looks really encrypted

And inside the `encrypt_wav.py`

we can see the following script.

We start analyzing the code from function `encrypt_audio()`

. First several lines of code does pretty boring and totally unrelated to crypto stuff:

- Open input sound file for reading
- Open output sound file for writing
- Set output file parameters similar for input file.

Next, there is line `cryptStream = LCGCryptStream()`

. We will look at that class later. After that all audio frames from input files are read to `frames`

variable:

`frames = inpaudio.readframes(inpaudio.getnframes())`

What happens next is very important for us — `frames`

array is padded with `0x42`

bytes (first, 1000 of them are added to array `0x42`

is being added until array size is divisible by 1000). Let’s stress out that now we know that `frames`

contains at least 1000 `0x42`

bytes at the end.

After all there is a following loop, which encrypts each audio frame separately and write result to the output file:

`for i in range(0, len(frames), 2):`

frame = frames[i: i + 2] # Single frame is two bytes long

# Here is useless print statement

encaudio.writeframes(cryptStream.encryptFrame(frame))

Now it is time to look at the `LCGCryptStream`

class as that’s where all the actual encryption happens.

The class object is initialized as follows:

- Pick four random values M, A, B and state.
- Call
`next_byte()`

function 100 times and ensure that it doesn’t return any duplicates. If it does, go back at step 1.

The function `next_byte()`

is pretty interesting. It performs calculation `self.state = (self.state * self.a + self.b) % self.m`

until the value of `self.state`

become less than 256 and returns this value. So function `next_byte()`

returns some numbers from sequence that are generated with the formula:

`self.state = (self.state * self.a + self.b) % self.m`

To get better look at it let’s rewrite it as follows:

This is the formula for Linear Congruential Generator (LCG). We will get back to that formula in a minute. But now let’s consider the last relevant piece of code — `encryptFrame()`

function:

What it does is actually just a simple XORing of frame value with two bytes generated with `next_byte()`

function (most significant 8 bits XORed with first byte and least significant 8 bits — with second byte). So basically that’s just a bytewise XOR encryption and that’s all we care about here.

Okay, now that we’ve analyzed all the code we have to figure out how to decrypt the encrypted audio. To do that we’d like to know the keystream, which is generated by LCG. And we know nothing nor about the parameters A, B or M, neither about the initial state of the generator. Bruteforcing those parameters, though possible, would take too much time (top border for each parameter is `2^12`

, so it might take `2^12 * 2^12 * 2^12 * 2^12 = 2^48`

steps to find the correct combination).

Notice that we know that plaintext audio data ended with at least 1000 `0x42`

bytes because of the padding. Now using the XOR properties we can determine last 1000 values function `next_byte()`

returned as follows just XORing `0x42`

with last 1000 bytes of the encrypted audio. But would that be helpful? Even if we know last 1000 values, is it possible for us to determine any of the previous values returned from `next_byte()`

?

Let’s look again at the LCG formula

Notice that all the calculations are performed modulo M. So there are at most M distinct values that could be calculated by this formula (`0..M-1`

), and after at most M calls to `next_byte()`

the sequence will make a cycle. Here is the example with small numbers:

`>>> S = 5`

>>> A = 9

>>> B = 7

>>> M = 13

>>> for i in range(20):

... S = (S * A + B) % M

... print(S)

0

7

5

0

7

5

0

7

5

0

7

5

0

7

5

0

7

5

0

7

>>>

Apparently there are only 3 distinct values that can be generated with our parameters: 0, 7, 5. And after that the sequence is cycled.

Using this property of LCG we could restore the whole keystream generated with LCG. Unfortunately the `LCGCryptStream`

uses some random value for M from range `2^9..2^12`

. So potentially we might need `2^12`

values to find the cycle (and we have only 1000). But, we should notice that `next_byte()`

returns `self.state`

**only **if it has value less than 256. That means that it would take at most 256 calls to `next_byte()`

to find the cycle. And that’s much lower than 1000 values that we have.

All right, let’s write some code now. First, we are going to find last 1000 bytes generated by `next_byte()`

for that we will XOR 1000 last bytes of audio data with `0x42`

:

On the screenshot above we can clearly see cycles in the keystream. Now we have the actual keystream that was used to encrypt the whole audio data. But there is one more problem — we do not know at which point the stream started. The stream on the screenshot starts with `18`

but that’s the byte at the position 1000 from the end of audio data. But actual keystream could start with `135`

, or `177`

, or any other value that lay between consecutive `18`

. Here we could just try to decrypt backwards (we know for sure the last value of the key stream), but let’s determine the actual stream start. For that we are going to decrypt the whole audio file with our keystream starting from positions `0, 1, ..., len(cycle)`

. And as soon as we see 1000 `0x42`

values at the end of decrypted data, we can tell for sure that the data was decrypted correctly, so we save it to the output file

And now we can see perfectly fine audio stream

When we listen to this we can hear the flag on top of classic Never Gonna Give You Up by Rick Astley.