After nearly two years since I poked around in Robin Hood's guts (I'm referring to the game, of course), I got the urge to investigate further, and now have a rough idea of how the data in the rules files is stored.
I will be discussing the format based on the "ERULES.PRG" file, which contains English text. The other rules files should be in exactly the same format, only the string values should be different. All bytes are in little-endian format (unless otherwise stated).
I'll state up here at the start of this post, if anyone wants to join forces in exploring the intricacies of the game, feel free to send me an e-mail: jestarjokin@jestarjokin.net
View after the break, IF YOU DAAAAARE...!
ERULES.PRG
The rules file is split into 10 sections (based on the number of functions called to read in the data). The file begins with a small 2-byte header (containing a value of 0x0000); the 10 sections follow.
Section |
Start |
End |
Size |
Header |
0x00000000 |
0x00000002 |
0x00000002 |
Section 1 |
0x00000002 |
0x00000666 |
0x00000664 |
Section 2 |
0x00000666 |
0x000012E6 |
0x00000C80 |
Section 3 |
0x000012E6 |
0x0000628E |
0x00004FA8 |
Section 4 |
0x0000628E |
0x00006502 |
0x00000274 |
Section 5 |
0x00006502 |
0x00007260 |
0x00000D5E |
Section 6 |
0x00007260 |
0x000110E6 |
0x00009E86 |
Section 7 |
0x000110E6 |
0x00011122 |
0x0000003C |
Section 8 |
0x00011122 |
0x000113BE |
0x0000029C |
Section 9 |
0x000113BE |
0x000114E8 |
0x0000012A |
Section 10 |
0x000114E8 |
0x00011562 |
0x0000007A |
Section 1
Section 1 is pretty simple. The number of entries is given as 2 bytes, that number of bytes then follows.
Section 1 |
Data |
Size |
num_entries |
2 |
entry+ |
1 * num_entries |
Section 2
Section 2 starts with 2 bytes for the number of entries.
Each entry is made up of a number of components, starting with s2WordA, which is 2 bytes. If this value is not 0xFFFF, it is multiplied by 8 and added with 4 when read. This occurs again for s2WordB, another 2 bytes. s2WordC is read as 2 bytes, s2WordD is another 2 bytes.
After this, there are 10 individual bytes. Then follows two arrays, each 32 bytes long.
In total, one entry will be 82 bytes long. Robin Hood contains 39 entries.
Section 2 |
Data |
Size |
num_entries |
2 |
entries |
82 * num_entries |
Section 2 Entry |
Data |
Size |
s2WordA |
2 |
s2WordB |
2 |
s2WordC |
2 |
s2WordD |
2 |
s2ByteA |
1 |
s2ByteB |
1 |
s2ByteC |
1 |
s2ByteD |
1 |
s2ByteE |
1 |
s2ByteF |
1 |
s2ByteG |
1 |
s2ByteH |
1 |
s2ByteI |
1 |
s2ByteJ |
1 |
s2ArrayA |
32 |
s2ArrayB |
32 |
Section 3
Section 3 contains string data! It begins with the number of entries, and the total size of the string data (in sub-section 2). Following are two sub-sections.
The first sub-section contains 2-byte values; these values represent the start of each string, given as an offset from the beginning of the string data chunk (in sub-section 2). For example, if the first string in sub-section 2 is "Wh\xCEps!\x00", this is 7 characters long (the "\x" values are my way of denoting raw ASCII values). Therefore, the first entry in sub-section 1 is 0x00, and the second entry is 0x07.
The second sub-section contains all of the string values. The string values are all null-terminated. They are also compressed in an unusual manner; characters with an ASCII value over 127 (possibly higher) are treated specially, and converted based on a lookup table hardcoded into the executable. For example, the value 0xC8 is translate to "oo" (double O). This compresses the string "Whoops!" down to "Wh.ps!", with "." representing the value of 0xC8. Other common letter combinations are compressed, such as "me", "or", "I am", etc.
Strings are also interesting in that they can contain multiple variations of one line of dialogue. These are represented by nesting one or more entries at the start of the string, enclosed by square brackets. e.g.
"[[Hello my sweet.]Hello darling.]Good morrow, my love."
It seems the game will randomly (or perhaps linearly) choose which variation to display.
Section 3 |
Data |
Size |
num_entries |
2 |
string_data_size |
2 |
sub-section 1 |
2 * num_entries |
sub-section 2 |
string_data_size |
Section 3 Sub-Section 1 |
Data |
Size |
string_offset |
2 * num_entries |
Section 3 Sub-Section 2 |
Data |
Size |
string_data |
variable size * num_entries (totalling string_data_size) |
Section 4
Section 4 starts with 2 bytes for the number of entries, followed by that many entries. Each entry is a word (16-bit).
Section 4 |
Data |
Size |
num_entries |
2 |
entry+ |
2 * num_entries |
Section 5
Section 5 starts with 2 bytes for the number of entries, followed by that many entries. Each entry is a word (16-bit). This seems to be the same format as Section 4.
Section 5 |
Data |
Size |
num_entries |
2 |
entry+ |
2 * num_entries |
Section 6
Section 6 starts with 2 bytes for the number of entries, followed by that many entries. Each entry is a word (16-bit). It is then followed by another 2 bytes for the second lot of entries. Each of these second entries is only one byte long.
Section 6 |
Data |
Size |
num_entries_one |
2 |
entry_one+ |
2 * num_entries_one |
num_entries_two |
2 |
entry_two+ |
1 * num_entries_two |
Section 6
Section 6 starts with 2 bytes for the number of entries, followed by that many entries. Each entry is a word (16-bit). It is then followed by another 2 bytes for the second lot of entries. Each of these second entries is only one byte long.
Section 6 |
Data |
Size |
num_entries_one |
2 |
entry_one+ |
2 * num_entries_one |
num_entries_two |
2 |
entry_two+ |
1 * num_entries_two |
Section 7
Section 7 is hardcoded as being 60 (0x3C) bytes long. Each value in this section seems to be a single byte.
Section 7 |
Data |
Size |
s7ByteA |
1 * 60 |
Section 8
Section 8 starts with 1 byte for the number of entries. The number of entries can be 0. There are two sub-sections here.
Section 8 Sub-Section 1 contains an array of bytes. Each byte represents the size of the data chunk in Sub-Section 2. This array is summed to get the total size of the data.
Section 8 Sub-Section 2 contains the variable-sized data. The size of this section is determined by the summed value generated when reading Sub-Section 1.
Section 8 |
Data |
Size |
num_entries |
1 |
Section 8 Sub-Section 1 |
Data |
Size |
size_of_data |
1 |
Section 8 Sub-Section 2 |
Data |
Size |
data_chunk |
num_entries * variable size |
Section 9
Section 9 starts with 2 bytes for the number of entries. Each entry consists of four lots of 16-bit words.
In Robin Hood, there are 37 entries.
Section 9 |
Data |
Size |
num_entries |
2 |
entry+ |
16 * num_entries |
Section 9 Entries |
Data |
Size |
s9WordA |
2 |
s9WordB |
2 |
s9WordC |
2 |
s9WordD |
2 |
Section 10
Section 10 is an odd one. It contains with a 2-byte value, followed by an array of 20 bytes, then an array of 20 words, then another array of 20 words, then another array of 20 bytes.
The values of the last array are manipulated when each value is read in, like so:
if val == 0x20:
val = 0x39
else if val == 0x06:
val = 0x1C
else:
val = val - 0x41
Section 10 |
Data |
Size |
s10WordA |
2 |
s10ArrayA |
1 * 20 |
s10ArrayB |
2 * 20 |
s10ArrayC |
2 * 20 |
s10ArrayD |
1 * 20 |
That was a massive wall of text, right? I've also added it to the [http://rewiki.regengedanken.de/wiki/.PRG|REWiki] site.