Unpacking Garfield Game Files
You probably heard about Garfield, the cat with orange fur that loves sleep and lasagna. It turns out he is not only protagonist of comics and cartoons, but also computer games. One of such games uniquely named Garfield
was released for PC in 2004 by company named The Code Monkeys
. In this post, we will take a look into the structure of files used by the game.
Getting started
After installation, inside the main directory we can see data
folder, where all the assets lies. The content of audio
and fmv
is easily accessible, since it is not packed and files inside use widely known formats, such as .wav
. More interesting things such as images, 3d models are probably hidden inside mysterious .pak
files. Since this is mostly like propriety format, content of those files is inaccessible to most people, however, not for us.
The first thing that should be applied when dealing with unknown file formats is to throw such file inside your favourite hex editor. As we can see, the first 4 bytes, when converted to ASCII are equal to “PACK”. As this value is const among all the .pak
files, it is mostly likely the header, so the game parser knows that it is dealing with the right file. In the next bytes we can spot “PNG” string, just before it there is byte equal to 0x89. Together it forms the beginning of the .png
image. So we know that there is one packed inside this file, we also know that this is not compressed nor encrypted (otherwise we wouldn’t see the raw png header). But how long it is? Does this image have any index or name? Is it the only image in this .pak
file?
The unknown bytes
Between the “PACK” and “PNG” we got 8 unknown bytes. The whole file is 0x1D8018 bytes long. When we interpret first 4 bytes after the “PACK” as little endian value, it looks like some kind of offset that is pointing to almost the end of the file - 0x1D5518. When we go there, we can see that there is a name of a .png
file, so our suspicions were right. It is indeed offset to some valuable information.
We still got 4 unknown bytes. We could read it in multiple ways, similar as previously, as a 4 byte little endian integer, but on the other hand maybe we should read it byte by byte? Or maybe there are two 2 byte integers? What this value represent? We still don’t know how to read the size of that png image. Questions arise, but is there a way to solve it without guessing?
It turns out that indeed, this can be solved without guessing. Don’t get me wrong. It is perfectly fine to try to solve it by trial and error method (which later in this post will be kinda continued), especially for such file format that looks easy, but to make it more entertaining and more educational, let me introduce more powerful method that will give answers to our questions.
Fun with debugger
Let’s attach debugger to the game. Obviously we don’t have the source code, so we are forced to debug it on the assembly level. For this task I will use x64dbg
. The first question that immediately comes to the mind is where to look. The game binary is huge and contains millions of instructions, most of which are irrelevant to us. Good idea for the first try is to look at the string references. There is high probability that at least one of the .pak
filename is hardcoded and is used as a parameter to the “open file” function.
To look for string references in x64dbg
we can use the button shown above. Please note the references are searched per module, so you need to be sure you are in the main one. If you want to change current module, you can go to Symbols
tab and then double-click the module you are interested in.
And here it is. As the function is in the export table (a bit unusual for the .exe
file), we can see its name so we can be even more certain that we are looking at the correct place
Reversing the Load_PakFile
function
First, we put a software breakpoint inside this function and resume the program execution. The breakpoint is indeed hit and we can step through the function. Now the question is, what does this function do to the file data, how does it treat our unknown bytes? Before we answer this, first we need to understand how it is even possible that the game transfers bytes from our hard disk to RAM memory.
Reading file to RAM
You see, on modern platforms like Windows it is impossible for a user-mode application (like game) to talk directly to the hardware (like hard disk). To retrieve information from a hardware, apps need to send requests to the kernel. On Windows such requests can be send using WinAPI functions. So if you are creating application that does some actions to files on Windows, it will eventually use some WinAPI calls to deal with files (under the hood), even if you aren’t aware of that. For the purpose of our task, we are interested in functions that gives us access to files. Probably there are multiple ways to do this using WinAPI, but the most popular way for programs is to use CreateFile
to open a handle, and then ReadFile
to read data to RAM memory.
Breakpoints on API calls
To put breakpoint on API call in x64dbg
you can press Ctrl+G and then type the name of the API function. In our case - CreateFileA
and ReadFile
. Just a one more thing to note. On Windows, a WinAPI function that accepts string as a parameter comes in two versions, ASCII and Unicode. The ASCII ones ends with A
, and Unicode ones with W
. That is the reason why there are two functions for opening a handle to a file - CreateFileA
and CreateFileW
. The ASCII function always call Unicode function at the end of the day, so if you are not sure which version is used by the application, it is always safer to put breakpoint on the Unicode call first. However for this game you can safely put the breakpoint on the ASCII one, as this is the function used by the game.
Digging deeper
As you remember, we are now paused at the beginning of Load_PakFile
, after stepping over the third call we stop on CreateFileA
, looks good so far, this means that the first call is there to open handle to the file. We go back to the function. While going back we also notice that GetFileSize
and SetFilePointer
are called. This might be useful later.
At this point we are just after the third call.
As we seen that the SetFilePointer
is used, I placed the breakpoint there too and continue stepping. And it pays off, after few calls we stop at the SetFilePointer
. From the WinAPI documentation we know that this function accepts four parameters. We can look on the stack, what are their values.
The first one is number that identify the file handle, it may be different on each run. Next two parameters are zero, and the last one is equal to 1 - FILE_CURRENT
. So after this call the position of the file cursor will be moved by zero bytes, counting from the current position. Looks kinda useless. We continue stepping.
And we finally land at ReadFile
, the first call to read file data to RAM. Now developers might use multiple different techniques, for example they can read the data chunk by chunk, or read everything at once and then parse something further in the program, but in this case, it looks like they decided to read just as little bytes as they need in the moment, look at the parameters at WinAPI documentation and compare with what we get at the stack:
So they want to read 4 bytes from the .pak
file into data buffer at 0x18FDBC. After we go back to the Load_PakFile
we see that those bytes read from the file are compared with 0x4B434150
or PACK
in ASCII. So we were right. The PACK
is a header, so the game parser can do sanity check and be sure that it is processing valid .pak
file.
Again, we continue with the flow and see another call to ReadFile
, another 4 bytes are being read, as you remember this is offset to the first file name in .pak
file, then another 4 bytes. Those are the “unknown” to us. Now we see that they should be treated as single integer. It is interesting that the value is shifted left by 6 bytes, so in other words multiplied by 0x40 (64 in decimal).
At the end of the image above there is call to [eax+108]. This is a wrapper for RtlAllocateHeap (I stepped into it, so I know) which allocates a buffer with length equal to the value unknown bytes shifted left by 6. Next call sets the file cursor to the first file name offset and then in the loop the game reads the file by 64 bytes chunks into the previously allocated buffer. The loop is executed as many times as the value of 4 bytes “unknown” integer.
Using obtained information
We could reverse the game further, to check what is done to the data that are now in previously allocated buffer, however, we have some new information that we can apply. Firstly, the previously unknown bytes are mostly like the count of files inside pak
file. Then, the offset to first filename is in reality offset to the file information header, and probably each file information header entry is 64 bytes long (as the game was reading it in chunks with that length)
Inspecting the file entry
Now, go back to the binary file at that filename offset. We see that at the beginning there is the filename. Length of the name is nowhere to be found, so we can suspect that this should be read until first null character. The last 8 bytes of the entry looks like two 4 bytes integers, their value is repeated twice for some reasons. Those values look like another offset or file length. The same with preceding 4 bytes, another offset that might be a start of the file, as it looks it is increased in each next entry.
Now we go to the suspected file start offset, and from that place we select next X bytes, where X is equal to the value of last 4 bytes of the file entry (suspected file length). We save those bytes to separate file with png
extension and try to run it:
We are greeted with an image. Now we can write simple python script to automate the unpacking process, and also to verify if it works for all the files.
Code to unpack the .pak file
What is inside?
All files has been unpacked so far, and it looks it worked correctly, as the txt
files can be read, the same with png
images. For the purpose of game modding, the fonts
directory may be interesting, as it contains fonts and dialogs in different languages, so for example if you would like to translate the game to your language, you could modify those dialogs and then repack the files back. What is interesting, when I run the game the language is set to Polish and it looks it cannot be changed, but there is no Polish font in the fonts
folder. It turns out the file named english.lng
contains the translated Polish dialogs.
To prove this script works for other files, I unpacked some of them, like attic.pak
the content is a little bit different and consist of extension such as:
- png - It is probably obvious, images of some assets, loading screens etc
- rws - This is the most mysterious file extension, probably contains animations, textures
- ape - This is plain text file, contains description where checkpoints are on the map, and some information to assist the AI
- bin - Probably meshes
Conclusion
As you can see, it was not that hard to make sense how the pak
file is structured and then to make script to unpack the content. In future posts I will try to show examples with different difficulty levels (encryption, compression). Thanks for reading.