In the last post I described a way that allowed to unpack content of proprietary format. That format was fairly straightforward with no encryption nor decompression and thus it was possible to grab files that are stored inside without even touching a debugger (we used one tho, but it was not necessary). This time, however, things will get a little bit more interesting. Today we will tackle a game from the Crazy Chicken series, to be more precise - Crazy Chicken Kart 2 (or Moorhuhn Kart 2 in original)

Getting started

First we need to inspect files in the game’s installation directory, to see where assets are stored. We can see that there is a folder named data which contains a file named mhk2-00.dat. It is the largest file, which takes about 140MB of space. This will be our target.

When we open the file in hex editor we can see this:

hxd

Let’s guess

We can try to guess the structure of the file without using any debugger. Imagine you are a game developer and you are tasked with writing a parser that will unpack and load required files to the game memory. What kind of information is needed?

  • File count - it is possible to create a file format without it, for example you can put the file header at the end of the content, and then iterate over elements until you meet EOF (end of file), but generally the file count is a part of a file format
  • Location of entry name/id - There must be a way to identify the entry somehow. It can be for example a numeric ID, or a filename just as casual file on the disk
  • Way to obtain the entry content location and its length

There are plenty of possibilities how such information can be stored. Let’s see what we can assume just by looking at the previous image. The very first 16 bytes form the string Moorhuhn Kart 2. We can treat it like a file header to be sure we are dealing with the right file. Multiple filenames can be seen. From the beginning of the first filename, till the beginning of the next filename there is exactly 0x80 bytes of space. This is applicable also for the next files, when we scroll the view down. For now we can assume that this space is dedicated to describing a particular file entry. Inside such fragment, there probably are our two missing elements, offset to the entry content and its length. There are two 32 bit integers. We can see that one points somewhere much further into the file, while the other is a much smaller integer, so the first one is the data offset, the second one - data length.

hxd_markings

Now, just after the Moorhuhn Kart 2 string we can see an integer 0x456. For now assume this is the file count. We see that the first file entry starts at offset 0x40. The entry is 0x80 bytes len, and there are 0x456 files. So if the 0x456 is really the file count, at offset 0x40 + 0x456 * 0x80 is the end of the last file entry. Let’s check! And indeed. It looks like 0x22B40 is the beginning of data and at the same time the end of file entries. To be even more sure, let’s look at the first file entry and at its data offset, it is also 0x22B40! So it is even better proof that 0x456 is indeed the file count and we are reading data offset just right.

22b40

Unpacking

Summarizing all the information obtained above, we can write a simple python script that will iterate over all the file entries, extract and save their content under appropriate names.

import struct
from pathlib import Path


file = open("mhk2-00.dat", mode="rb")

# Seek to the file count
file.seek(0x20)
filecount, = struct.unpack("<H", file.read(2))

for i in range(filecount):
    # Seek to the beginning of the header
    file.seek(0x40 + i * 0x80)
    name = b""
    
    # Read null terminated string
    while (a := (file.read(1))) != b"\x00":
        name += a
        
    print(name)
    # Seek to the offset and length position
    file.seek(0x40 + i * 0x80 + 0x68)
    offset, length = struct.unpack("<II", file.read(8))
    
    filename = name.decode("ascii")
    
    # Create the file directory
    Path(filename).parent.mkdir(parents=True, exist_ok=True)
    
    # Seek to the file content
    file.seek(offset)
    with open(filename, mode="wb") as output:
        content = file.read(length)
        output.write(content)

It looks easy, isn’t it? So far it might be even easier than the format from the previous blog post. However, at the beginning of this post I promised that this will be more interesting and I didn’t lie.

What is inside?

Inside the root directory there are two items, config.txt and a directory mk2. Let’s take a look inside the mk2:

unpacked content

  • items - Contains different textures and presumably 3D objects of different game items
  • karts - Animations of all the playable characters in game
  • lensflares - Textures of flares
  • level0X - Configuration, textures, music, 3d objects and animations for different levels
  • menu - Music and textures to be displayed in main menu
  • misc - Fonts and HUD textures
  • settings - Encrypted configuration of different karts. Looks interesting in terms of modding
  • sfx - Different sound effects, collision, engine etc
  • text.csv - Translation of subtitles in different languages - interesting if you want to translate the game

Examining the results

When we look at the unpacked data it looks almost right. We can see the images and hear sound effects. However, when we open a file with txt extension, we are presented with gibberish:

encrypted file

It looks like the authors of the game decided to somehow obfuscate the content of text files, otherwise the text could be easily replaced directly in the .dat file, even without unpacking. As there are no checksums in the .dat file, this could lead to cheating (presumably the .txt files contains configuration of speed of different vehicles etc). To read the real content of the file we need to take another approach. As you can see, without knowing the algorithm that is used to decipher the content, it is almost impossible to progress further. Even if the algorithm would be known, we still somehow need to obtain the decryption key. Remember, the game must be able to read and understand the file content, so it implies that it knows the deciphering procedure. At the same time we have access to the game executable, this means we can discover this procedure too.

Reversing the text file decryption method

By observing the unpacked files we can see that there is a directory for each game level, inside each of them there are 4 folders, music, objects, settings and textures8bit, going deeper, inside the settings we can find 3 more folders, display, misc, objects, inside them there is a .txt file with the filename that corresponds to the directory name, so inside display you can find display.txt etc. As mentioned previously, all the txt files are encrypted.

Right now we are not interested in the way the game parses the mhk2-00.dat. We already know it. By making a list of all string references we can spot references to objects.txt.

string references

This is probably used to grab configuration file of game level we want to play. Let’s put breakpoints on all references and run the game.

breakpoint objects

After we hit a breakpoint lets put another one on a ReadFile win api call. There is also a possibility that the whole file was read to the memory at the program start, but this would be a waste of precious RAM (especially in 2003, when the game was created) to load all the level data at once, thus it is more likely that those levels are being read from disk as needed. As we resume the execution we can see that we hit the breakpoint on ReadFile, the buffer is filled with the encrypted data, just as desired.

read file

Interestingly, the first objects.txt to load is from level06 directory. Nevertheless we continue our journey to discover the decryption routine. In order to do so we need to put hardware breakpoint on access at the beginning of the buffer filled with encrypted data.

decryption routine

And boom, we landed a function that performs XORs and shift operations, this looks like some kind of decryption routine. As on the first look it is hard to judge what exactly this function is doing, it is a good idea to use the decompiler to do this for us. After opening the game executable in IDA and navigating to the same address (0x0450384) as we seen in the debugger, we are presented with this view:

normal ida

Not really clean, but after renaming some variables and changing their types it looks much better:

clean ida

Let’s add this function to our python script. Take a look at the last if statement in the script:

import struct
from pathlib import Path


file = open("mhk2-00.dat", mode="rb")

# Seek to the file count
file.seek(0x20)
filecount, = struct.unpack("<H", file.read(2))

for i in range(filecount):
    # Seek to the beginning of the header
    file.seek(0x40 + i * 0x80)
    name = b""
    
    # Read null terminated string
    while (a := (file.read(1))) != b"\x00":
        name += a
        
    print(name)
    # Seek to the offset and length position
    file.seek(0x40 + i * 0x80 + 0x68)
    offset, length = struct.unpack("<II", file.read(8))
    
    filename = name.decode("ascii")
    
    # Create the file directory
    Path(filename).parent.mkdir(parents=True, exist_ok=True)
    
    # Seek to the file content
    file.seek(offset)
    with open(filename, mode="wb") as output:
        content = file.read(length)
        
        # If filename ends with .txt perform the decryption
        if filename.endswith(".txt"):
            key = 0x1234
            for b in content:
                result = (2 * (b ^ key)) & 0xFF
                decrypted_char = result ^ (result ^ ((b ^ key) >> 1)) & 0x55
                key = (3 * key + 2) & 0xFFFF
                output.write(bytes([decrypted_char]))
        else:
            output.write(content)

And check how it works now:

decrypted content

As you can see our effort was worth it! Now the text file is decrypted and we are able to read its content.

Conclusion

Depending on the method used by the game manufacturer, sometimes it is not possible to unpack data files just by guessing the file structure. When we are dealing with obfuscation and/or encryption we need to reverse engineer the executable to obtain the decryption method. Thanks for reading.