This post will discuss the structure of CML files.
CML File Format
It's assumed that this is an acronym for "Character Markup Language". Indeed the data structure of the file seems to resemble one. Some basic info:
- The file format contains no pointers at all.
- To organise data, the file uses tags and elements.
- Each tag has a bit of data (usually a short) telling the parser how many elements belong to a given tag
- Elements within tags contain an element ID. The format isn't entirely understood and seems oddly inconsistent.
- Tags are NOT 32bit aligned.
- Tags can theoretically appear in any order and be split into as many parts as they like
- Elements within tags can appear in any order. There may also be duplicate elements.
- Tags can contain no elements at all!
Overall file Structure
Header structure:
- "CML" (4 chars)
- Total filesize (integer)
- Filesize (of the actual data, presumably starting from the VTBF tag)
- Location of the VTBF tag (integer)
- 8 unknown bytes (seem to be 2 integers)
- 0x28 bytes of 0s
- The filename.
As you can probably gather from the previous section, there isn't necessarily a consistent structure to cml files. That said, there is a generally consistent format (representative):
- Code: Select all
<VTBF>
<CMLP>
</CMLP>
<DOC>
</DOC>
<FIGR>
</FIGR>
<COLR>
</COLR>
<SLCT>
</SLCT>
</VTBF>
Tag Types
VTBF - Seems to contain the entire file.
CMLP - Not sure.
DOC - Contains simple biodata, such as Race and Gender
FIGR - Contains sliders for shapes, such as body, face, etc.
COLR - Contains sliders for colours.
SLCT - Contains IDs for parts, such as hairstyle, outfit, accessories, etc.
Tag Structure
A tag itself has string identifiers for the start and end of sections of length 4 chars in ASCII encoding.
A tag can start with any of the above mentioned types. A tag ends on a string "vtc0", unless the end of the section is the end of the file.
Following the tag is usually 2 empty bytes followed by a short. This short represents the number of elements the tag contains. For example, the bytes "46 49 47 52 00 00 0B 00" are tag "FIGR" followed by 2 0s and then a 0x0B, which is saying this tag contains 11 elements.
Element Structure
As you may imagine, this varies by tag. A SLCT tag element is not like a FIGR tag. FIGR and COLR are practically identical however.
- 1 Byte for element ID
- 2 unknown Bytes for FIGR and COLR. 1 for SLCT.
- Element ID can be a hexadecimal number or it can be a binary encoded decimal.
- FIGR elements seem to be hexadecimal.
- There aren't enough COLR elements to say.
- SLCT element ids are binary encoded decimals.
FIGR and COLR elements contain 3 signed integers.
FIGR's ints are vertex1Ypos, vertex2Xpos, vertex2Ypos.
COLR's ints are SaturationYpos, ColorXpos, ColorYpos
Note that the COLR is just coordinates not actual colour data. These are values that take values from a map, and each race has their own map. This is why when you convert a character from one race to another, their skin colour may change.
SLCT is simpler. SLCT is just an integer ID.
Element IDs
This is where things get a bit more theoretical. The theory is that element ID numbers correspond to a given part of customization. For example, there are element IDs for the nose size slider, for eye color and for hairstyle. Going by the success of the CMLparser so far it seems this theory holds up, as it contains an enum that maps out the cml's elements to an array index using this element id. As such, this is the list of element IDs and the sections they match up to:
Note: The IDs themselves have been changed to make them work as array indices. For example, ID 0x25 is treated as "5" as the IDs for the COLR tag elements start as 0x20. It's plausible that some bits in the first nybble are used as flags.
- Code: Select all
DOC (starts at 0x70)
0 - Race
1 - Sex
2 - Unknown
3 - Unknown
FIGR (starts at 0x0)
0 - MainBody
1 - Arm
2 - Leg
3 - Bust
4 - Unknown1
5 - FaceShape
6 - EyeShape
7 - NoseHeight
8 - NoseShape
9 - Mouth
10 - Ears/Horns
11 - Neck
12 - Waist
13 - MainBody2
14 - Arm2
15 - Leg2
16 - Bust2
17 - Unknown2
18 - Neck2
19 - Waist2
20 - Unknown3
21 - Unknown4
COLR (starts at 0x20)
0 - Costume
1 - Main
2 - Sub1
3 - Skin/Sub2
4 - LeftEye
5 - RightEye/Eyes
6 - Hair
7 - Unknown
SLCT (starts at 0x40. Uses BCD)
0 - Costume/OuterWear/BodyPart
1 - BodyPaint
2 - Sticker
3 - RightEye/Eyes
4 - EyeBrow
5 - EyeLash
6 - FaceType
7 - Unknown1
8 - MakeUp1
9 - HairStyle
10 - Acc1
11 - Acc2
12 - Acc3
13 - MakeUp2
14 - LegPart
15 - ArmPart
16 - Acc4
17 - BaseWear
18 - InnerWear
19 - BodyPaint2
Bio Data
As this contains a small array of values, might as well share them here.
Possible race values:
- Code: Select all
0 - Human
1 - Newman
2 - Cast
3 - Dewman
Possible sex values:
- Code: Select all
0 - Male
1 - Female