AudioCrypto SunshineCTF 2021 Writeup

0awawa0
6 min readSep 21, 2021

--

Foreword

So I’ve participated the SunshineCTF 2021 that took place last weekend. I focused on crypto tasks only. There were only three of them, and here I will walk through the solution to the most interesting one — AudioCrypto. The other two were actually pretty boring (trivial Base64 encoding and not as trivial but not much harder common modulus RSA).

Actually I didn’t manage to solve this task during the competition and generally I don’t publish tasks I solve afterwards. But I really think that’s an interesting task and it seems that there are no detailed writeups on it (actually the only one I found is written by task designers: link) so I’ve decided to write one. I think that might be really helpful for CTF beginners.

Prerequisites

For this task we are given two files: flag_encrypted.wav and encrypt_wav.py.

Solution

Well the flag_encrypted.wav looks really encrypted

And inside the encrypt_wav.py we can see the following script.

We start analyzing the code from function encrypt_audio(). First several lines of code does pretty boring and totally unrelated to crypto stuff:

  1. Open input sound file for reading
  2. Open output sound file for writing
  3. Set output file parameters similar for input file.

Next, there is line cryptStream = LCGCryptStream(). We will look at that class later. After that all audio frames from input files are read to frames variable:

frames = inpaudio.readframes(inpaudio.getnframes())

What happens next is very important for us — frames array is padded with 0x42 bytes (first, 1000 of them are added to array 0x42 is being added until array size is divisible by 1000). Let’s stress out that now we know that frames contains at least 1000 0x42 bytes at the end.

After all there is a following loop, which encrypts each audio frame separately and write result to the output file:

for i in range(0, len(frames), 2):
frame = frames[i: i + 2] # Single frame is two bytes long
# Here is useless print statement
encaudio.writeframes(cryptStream.encryptFrame(frame))

Now it is time to look at the LCGCryptStream class as that’s where all the actual encryption happens.

The class object is initialized as follows:

  1. Pick four random values M, A, B and state.
  2. Call next_byte() function 100 times and ensure that it doesn’t return any duplicates. If it does, go back at step 1.

The function next_byte() is pretty interesting. It performs calculation self.state = (self.state * self.a + self.b) % self.m until the value of self.state become less than 256 and returns this value. So function next_byte() returns some numbers from sequence that are generated with the formula:

self.state = (self.state * self.a + self.b) % self.m

To get better look at it let’s rewrite it as follows:

This is the formula for Linear Congruential Generator (LCG). We will get back to that formula in a minute. But now let’s consider the last relevant piece of code — encryptFrame() function:

What it does is actually just a simple XORing of frame value with two bytes generated with next_byte() function (most significant 8 bits XORed with first byte and least significant 8 bits — with second byte). So basically that’s just a bytewise XOR encryption and that’s all we care about here.

Okay, now that we’ve analyzed all the code we have to figure out how to decrypt the encrypted audio. To do that we’d like to know the keystream, which is generated by LCG. And we know nothing nor about the parameters A, B or M, neither about the initial state of the generator. Bruteforcing those parameters, though possible, would take too much time (top border for each parameter is 2^12, so it might take 2^12 * 2^12 * 2^12 * 2^12 = 2^48 steps to find the correct combination).

Notice that we know that plaintext audio data ended with at least 1000 0x42 bytes because of the padding. Now using the XOR properties we can determine last 1000 values function next_byte() returned as follows just XORing 0x42 with last 1000 bytes of the encrypted audio. But would that be helpful? Even if we know last 1000 values, is it possible for us to determine any of the previous values returned from next_byte()?

Let’s look again at the LCG formula

Notice that all the calculations are performed modulo M. So there are at most M distinct values that could be calculated by this formula (0..M-1), and after at most M calls to next_byte() the sequence will make a cycle. Here is the example with small numbers:

>>> S = 5
>>> A = 9
>>> B = 7
>>> M = 13
>>> for i in range(20):
... S = (S * A + B) % M
... print(S)
0
7
5
0
7
5
0
7
5
0
7
5
0
7
5
0
7
5
0
7
>>>

Apparently there are only 3 distinct values that can be generated with our parameters: 0, 7, 5. And after that the sequence is cycled.

Using this property of LCG we could restore the whole keystream generated with LCG. Unfortunately the LCGCryptStream uses some random value for M from range 2^9..2^12. So potentially we might need 2^12 values to find the cycle (and we have only 1000). But, we should notice that next_byte() returns self.state only if it has value less than 256. That means that it would take at most 256 calls to next_byte() to find the cycle. And that’s much lower than 1000 values that we have.

All right, let’s write some code now. First, we are going to find last 1000 bytes generated by next_byte() for that we will XOR 1000 last bytes of audio data with 0x42:

On the screenshot above we can clearly see cycles in the keystream. Now we have the actual keystream that was used to encrypt the whole audio data. But there is one more problem — we do not know at which point the stream started. The stream on the screenshot starts with 18 but that’s the byte at the position 1000 from the end of audio data. But actual keystream could start with 135, or 177, or any other value that lay between consecutive 18. Here we could just try to decrypt backwards (we know for sure the last value of the key stream), but let’s determine the actual stream start. For that we are going to decrypt the whole audio file with our keystream starting from positions 0, 1, ..., len(cycle). And as soon as we see 1000 0x42 values at the end of decrypted data, we can tell for sure that the data was decrypted correctly, so we save it to the output file

And now we can see perfectly fine audio stream

When we listen to this we can hear the flag on top of classic Never Gonna Give You Up by Rick Astley.

--

--