You probably heard about Garfield, the cat with orange fur that loves sleep and lasagna. It turns out he is not only protagonist of comics and cartoons, but also computer games. One of such games uniquely named Garfield was released for PC in 2004 by company named The Code Monkeys. In this post, we will take a look into the structure of files used by the game.

Getting started

Files after instalation

After installation, inside the main directory we can see data folder, where all the assets lies. The content of audio and fmv is easily accessible, since it is not packed and files inside use widely known formats, such as .wav. More interesting things such as images, 3d models are probably hidden inside mysterious .pak files. Since this is mostly like propriety format, content of those files is inaccessible to most people, however, not for us.

Pak in hex editor

The first thing that should be applied when dealing with unknown file formats is to throw such file inside your favourite hex editor. As we can see, the first 4 bytes, when converted to ASCII are equal to “PACK”. As this value is const among all the .pak files, it is mostly likely the header, so the game parser knows that it is dealing with the right file. In the next bytes we can spot “PNG” string, just before it there is byte equal to 0x89. Together it forms the beginning of the .png image. So we know that there is one packed inside this file, we also know that this is not compressed nor encrypted (otherwise we wouldn’t see the raw png header). But how long it is? Does this image have any index or name? Is it the only image in this .pak file?

The unknown bytes

Between the “PACK” and “PNG” we got 8 unknown bytes. The whole file is 0x1D8018 bytes long. When we interpret first 4 bytes after the “PACK” as little endian value, it looks like some kind of offset that is pointing to almost the end of the file - 0x1D5518. When we go there, we can see that there is a name of a .png file, so our suspicions were right. It is indeed offset to some valuable information. Filename in hex editor We still got 4 unknown bytes. We could read it in multiple ways, similar as previously, as a 4 byte little endian integer, but on the other hand maybe we should read it byte by byte? Or maybe there are two 2 byte integers? What this value represent? We still don’t know how to read the size of that png image. Questions arise, but is there a way to solve it without guessing?

It turns out that indeed, this can be solved without guessing. Don’t get me wrong. It is perfectly fine to try to solve it by trial and error method (which later in this post will be kinda continued), especially for such file format that looks easy, but to make it more entertaining and more educational, let me introduce more powerful method that will give answers to our questions.

Fun with debugger

Let’s attach debugger to the game. Obviously we don’t have the source code, so we are forced to debug it on the assembly level. For this task I will use x64dbg. The first question that immediately comes to the mind is where to look. The game binary is huge and contains millions of instructions, most of which are irrelevant to us. Good idea for the first try is to look at the string references. There is high probability that at least one of the .pak filename is hardcoded and is used as a parameter to the “open file” function. x64dbg strnigs button To look for string references in x64dbg we can use the button shown above. Please note the references are searched per module, so you need to be sure you are in the main one. If you want to change current module, you can go to Symbols tab and then double-click the module you are interested in.

x64dbg pak string

And here it is. As the function is in the export table (a bit unusual for the .exe file), we can see its name so we can be even more certain that we are looking at the correct place

Reversing the Load_PakFile function

x64dbg loadpakfile

First, we put a software breakpoint inside this function and resume the program execution. The breakpoint is indeed hit and we can step through the function. Now the question is, what does this function do to the file data, how does it treat our unknown bytes? Before we answer this, first we need to understand how it is even possible that the game transfers bytes from our hard disk to RAM memory.

Reading file to RAM

You see, on modern platforms like Windows it is impossible for a user-mode application (like game) to talk directly to the hardware (like hard disk). To retrieve information from a hardware, apps need to send requests to the kernel. On Windows such requests can be send using WinAPI functions. So if you are creating application that does some actions to files on Windows, it will eventually use some WinAPI calls to deal with files (under the hood), even if you aren’t aware of that. For the purpose of our task, we are interested in functions that gives us access to files. Probably there are multiple ways to do this using WinAPI, but the most popular way for programs is to use CreateFile to open a handle, and then ReadFile to read data to RAM memory.

Breakpoints on API calls

To put breakpoint on API call in x64dbg you can press Ctrl+G and then type the name of the API function. In our case - CreateFileA and ReadFile. Just a one more thing to note. On Windows, a WinAPI function that accepts string as a parameter comes in two versions, ASCII and Unicode. The ASCII ones ends with A, and Unicode ones with W. That is the reason why there are two functions for opening a handle to a file - CreateFileA and CreateFileW. The ASCII function always call Unicode function at the end of the day, so if you are not sure which version is used by the application, it is always safer to put breakpoint on the Unicode call first. However for this game you can safely put the breakpoint on the ASCII one, as this is the function used by the game.

Digging deeper

As you remember, we are now paused at the beginning of Load_PakFile, after stepping over the third call we stop on CreateFileA, looks good so far, this means that the first call is there to open handle to the file. We go back to the function. While going back we also notice that GetFileSize and SetFilePointer are called. This might be useful later. x64dbg getting back from create file At this point we are just after the third call. x64dbg just after third call As we seen that the SetFilePointer is used, I placed the breakpoint there too and continue stepping. And it pays off, after few calls we stop at the SetFilePointer. From the WinAPI documentation we know that this function accepts four parameters. We can look on the stack, what are their values.

x64dbg set file pointer

The first one is number that identify the file handle, it may be different on each run. Next two parameters are zero, and the last one is equal to 1 - FILE_CURRENT. So after this call the position of the file cursor will be moved by zero bytes, counting from the current position. Looks kinda useless. We continue stepping.

x64dbg after set file pointer

And we finally land at ReadFile, the first call to read file data to RAM. Now developers might use multiple different techniques, for example they can read the data chunk by chunk, or read everything at once and then parse something further in the program, but in this case, it looks like they decided to read just as little bytes as they need in the moment, look at the parameters at WinAPI documentation and compare with what we get at the stack:

x64dbg readfile stack

So they want to read 4 bytes from the .pak file into data buffer at 0x18FDBC. After we go back to the Load_PakFile we see that those bytes read from the file are compared with 0x4B434150 or PACK in ASCII. So we were right. The PACK is a header, so the game parser can do sanity check and be sure that it is processing valid .pak file.

x64dbg compare pack

Again, we continue with the flow and see another call to ReadFile, another 4 bytes are being read, as you remember this is offset to the first file name in .pak file, then another 4 bytes. Those are the “unknown” to us. Now we see that they should be treated as single integer. It is interesting that the value is shifted left by 6 bytes, so in other words multiplied by 0x40 (64 in decimal).

x64dbg unkown value

At the end of the image above there is call to [eax+108]. This is a wrapper for RtlAllocateHeap (I stepped into it, so I know) which allocates a buffer with length equal to the value unknown bytes shifted left by 6. Next call sets the file cursor to the first file name offset and then in the loop the game reads the file by 64 bytes chunks into the previously allocated buffer. The loop is executed as many times as the value of 4 bytes “unknown” integer.

x64dbg loop

Using obtained information

We could reverse the game further, to check what is done to the data that are now in previously allocated buffer, however, we have some new information that we can apply. Firstly, the previously unknown bytes are mostly like the count of files inside pak file. Then, the offset to first filename is in reality offset to the file information header, and probably each file information header entry is 64 bytes long (as the game was reading it in chunks with that length)

Inspecting the file entry

Hex file entry

Now, go back to the binary file at that filename offset. We see that at the beginning there is the filename. Length of the name is nowhere to be found, so we can suspect that this should be read until first null character. The last 8 bytes of the entry looks like two 4 bytes integers, their value is repeated twice for some reasons. Those values look like another offset or file length. The same with preceding 4 bytes, another offset that might be a start of the file, as it looks it is increased in each next entry.

Now we go to the suspected file start offset, and from that place we select next X bytes, where X is equal to the value of last 4 bytes of the file entry (suspected file length). We save those bytes to separate file with png extension and try to run it:

nopad

We are greeted with an image. Now we can write simple python script to automate the unpacking process, and also to verify if it works for all the files.

Code to unpack the .pak file

import struct
import os
from pathlib import Path

destination = "output/"

# Set the filename to unpack here
with open("startup.pak", mode="rb") as file:
    header, files_info, file_count = struct.unpack("<III", file.read(4*3))

    for i in range(file_count):
        # Go to header of the file i
        file.seek(files_info + i * 0x40)
        
        # Read the filename until 0x00 byte
        filename = b""
        while (read_byte := file.read(1)) != b'\x00':           
            filename += read_byte
            
        filename = filename.decode("ascii")
        print("Unpacking {}...".format(filename))
        
        # Go to the file placement information
        file.seek(files_info + i * 0x40 + 0x34)
        start_offset, length = struct.unpack("<II", file.read(4*2))
        
        # Go to the beginning of the file data
        file.seek(start_offset)
        
        # Create required directories
        output_path = destination + filename
        os.makedirs(Path(output_path).parent, exist_ok = True)
        
        # Write the data to file
        with open(output_path, mode="wb") as output_file:
            output_file.write(file.read(length))

What is inside?

unpacked

All files has been unpacked so far, and it looks it worked correctly, as the txt files can be read, the same with png images. For the purpose of game modding, the fonts directory may be interesting, as it contains fonts and dialogs in different languages, so for example if you would like to translate the game to your language, you could modify those dialogs and then repack the files back. What is interesting, when I run the game the language is set to Polish and it looks it cannot be changed, but there is no Polish font in the fonts folder. It turns out the file named english.lng contains the translated Polish dialogs.

To prove this script works for other files, I unpacked some of them, like attic.pak the content is a little bit different and consist of extension such as:

  • png - It is probably obvious, images of some assets, loading screens etc
  • rws - This is the most mysterious file extension, probably contains animations, textures
  • ape - This is plain text file, contains description where checkpoints are on the map, and some information to assist the AI
  • bin - Probably meshes

Conclusion

As you can see, it was not that hard to make sense how the pak file is structured and then to make script to unpack the content. In future posts I will try to show examples with different difficulty levels (encryption, compression). Thanks for reading.