AudioCrypto SunshineCTF 2021 Writeup
So I’ve participated the SunshineCTF 2021 that took place last weekend. I focused on crypto tasks only. There were only three of them, and here I will walk through the solution to the most interesting one — AudioCrypto. The other two were actually pretty boring (trivial Base64 encoding and not as trivial but not much harder common modulus RSA).
Actually I didn’t manage to solve this task during the competition and generally I don’t publish tasks I solve afterwards. But I really think that’s an interesting task and it seems that there are no detailed writeups on it (actually the only one I found is written by task designers: link) so I’ve decided to write one. I think that might be really helpful for CTF beginners.
For this task we are given two files:
flag_encrypted.wav looks really encrypted
And inside the
encrypt_wav.py we can see the following script.
We start analyzing the code from function
encrypt_audio(). First several lines of code does pretty boring and totally unrelated to crypto stuff:
- Open input sound file for reading
- Open output sound file for writing
- Set output file parameters similar for input file.
Next, there is line
cryptStream = LCGCryptStream(). We will look at that class later. After that all audio frames from input files are read to
frames = inpaudio.readframes(inpaudio.getnframes())
What happens next is very important for us —
frames array is padded with
0x42 bytes (first, 1000 of them are added to array
0x42 is being added until array size is divisible by 1000). Let’s stress out that now we know that
frames contains at least 1000
0x42 bytes at the end.
After all there is a following loop, which encrypts each audio frame separately and write result to the output file:
for i in range(0, len(frames), 2):
frame = frames[i: i + 2] # Single frame is two bytes long
# Here is useless print statement
Now it is time to look at the
LCGCryptStream class as that’s where all the actual encryption happens.
The class object is initialized as follows:
- Pick four random values M, A, B and state.
next_byte()function 100 times and ensure that it doesn’t return any duplicates. If it does, go back at step 1.
next_byte() is pretty interesting. It performs calculation
self.state = (self.state * self.a + self.b) % self.m until the value of
self.state become less than 256 and returns this value. So function
next_byte() returns some numbers from sequence that are generated with the formula:
self.state = (self.state * self.a + self.b) % self.m
To get better look at it let’s rewrite it as follows:
This is the formula for Linear Congruential Generator (LCG). We will get back to that formula in a minute. But now let’s consider the last relevant piece of code —
What it does is actually just a simple XORing of frame value with two bytes generated with
next_byte() function (most significant 8 bits XORed with first byte and least significant 8 bits — with second byte). So basically that’s just a bytewise XOR encryption and that’s all we care about here.
Okay, now that we’ve analyzed all the code we have to figure out how to decrypt the encrypted audio. To do that we’d like to know the keystream, which is generated by LCG. And we know nothing nor about the parameters A, B or M, neither about the initial state of the generator. Bruteforcing those parameters, though possible, would take too much time (top border for each parameter is
2^12, so it might take
2^12 * 2^12 * 2^12 * 2^12 = 2^48 steps to find the correct combination).
Notice that we know that plaintext audio data ended with at least 1000
0x42 bytes because of the padding. Now using the XOR properties we can determine last 1000 values function
next_byte() returned as follows just XORing
0x42 with last 1000 bytes of the encrypted audio. But would that be helpful? Even if we know last 1000 values, is it possible for us to determine any of the previous values returned from
Let’s look again at the LCG formula
Notice that all the calculations are performed modulo M. So there are at most M distinct values that could be calculated by this formula (
0..M-1), and after at most M calls to
next_byte() the sequence will make a cycle. Here is the example with small numbers:
>>> S = 5
>>> A = 9
>>> B = 7
>>> M = 13
>>> for i in range(20):
... S = (S * A + B) % M
Apparently there are only 3 distinct values that can be generated with our parameters: 0, 7, 5. And after that the sequence is cycled.
Using this property of LCG we could restore the whole keystream generated with LCG. Unfortunately the
LCGCryptStream uses some random value for M from range
2^9..2^12. So potentially we might need
2^12 values to find the cycle (and we have only 1000). But, we should notice that
self.state only if it has value less than 256. That means that it would take at most 256 calls to
next_byte() to find the cycle. And that’s much lower than 1000 values that we have.
All right, let’s write some code now. First, we are going to find last 1000 bytes generated by
next_byte() for that we will XOR 1000 last bytes of audio data with
On the screenshot above we can clearly see cycles in the keystream. Now we have the actual keystream that was used to encrypt the whole audio data. But there is one more problem — we do not know at which point the stream started. The stream on the screenshot starts with
18 but that’s the byte at the position 1000 from the end of audio data. But actual keystream could start with
177, or any other value that lay between consecutive
18. Here we could just try to decrypt backwards (we know for sure the last value of the key stream), but let’s determine the actual stream start. For that we are going to decrypt the whole audio file with our keystream starting from positions
0, 1, ..., len(cycle). And as soon as we see 1000
0x42 values at the end of decrypted data, we can tell for sure that the data was decrypted correctly, so we save it to the output file
And now we can see perfectly fine audio stream
When we listen to this we can hear the flag on top of classic Never Gonna Give You Up by Rick Astley.