Page 1 of 2

CML file structure

PostPosted: Sun Mar 27, 2016 5:48 pm
by Chikinface
This is semi related to the other thread. In this one I'll be talking about the data structures involved for research purposes. This may help others down the line, it may help me if someone later manages to fill in some gaps that I was unable to. I'm somewhat new to this process, so there may be terms I'm using incorrectly or I may not be accurately identifying datatypes. Point this out to me if so.

This post will discuss the structure of CML files.

CML File Format

It's assumed that this is an acronym for "Character Markup Language". Indeed the data structure of the file seems to resemble one. Some basic info:

  • The file format contains no pointers at all.
  • To organise data, the file uses tags and elements.
  • Each tag has a bit of data (usually a short) telling the parser how many elements belong to a given tag
  • Elements within tags contain an element ID. The format isn't entirely understood and seems oddly inconsistent.
  • Tags are NOT 32bit aligned.
  • Tags can theoretically appear in any order and be split into as many parts as they like
  • Elements within tags can appear in any order. There may also be duplicate elements.
  • Tags can contain no elements at all!

Overall file Structure

Header structure:
  • "CML" (4 chars)
  • Total filesize (integer)
  • Filesize (of the actual data, presumably starting from the VTBF tag)
  • Location of the VTBF tag (integer)
  • 8 unknown bytes (seem to be 2 integers)
  • 0x28 bytes of 0s
  • The filename.

As you can probably gather from the previous section, there isn't necessarily a consistent structure to cml files. That said, there is a generally consistent format (representative):
Code: Select all

Tag Types

VTBF - Seems to contain the entire file.
CMLP - Not sure.
DOC - Contains simple biodata, such as Race and Gender
FIGR - Contains sliders for shapes, such as body, face, etc.
COLR - Contains sliders for colours.
SLCT - Contains IDs for parts, such as hairstyle, outfit, accessories, etc.

Tag Structure

A tag itself has string identifiers for the start and end of sections of length 4 chars in ASCII encoding.
A tag can start with any of the above mentioned types. A tag ends on a string "vtc0", unless the end of the section is the end of the file.

Following the tag is usually 2 empty bytes followed by a short. This short represents the number of elements the tag contains. For example, the bytes "46 49 47 52 00 00 0B 00" are tag "FIGR" followed by 2 0s and then a 0x0B, which is saying this tag contains 11 elements.

Element Structure

As you may imagine, this varies by tag. A SLCT tag element is not like a FIGR tag. FIGR and COLR are practically identical however.

  • 1 Byte for element ID
  • 2 unknown Bytes for FIGR and COLR. 1 for SLCT.
  • Element ID can be a hexadecimal number or it can be a binary encoded decimal.
  • FIGR elements seem to be hexadecimal.
  • There aren't enough COLR elements to say.
  • SLCT element ids are binary encoded decimals.

FIGR and COLR elements contain 3 signed integers.

FIGR's ints are vertex1Ypos, vertex2Xpos, vertex2Ypos.

COLR's ints are SaturationYpos, ColorXpos, ColorYpos

Note that the COLR is just coordinates not actual colour data. These are values that take values from a map, and each race has their own map. This is why when you convert a character from one race to another, their skin colour may change.

SLCT is simpler. SLCT is just an integer ID.

Element IDs

This is where things get a bit more theoretical. The theory is that element ID numbers correspond to a given part of customization. For example, there are element IDs for the nose size slider, for eye color and for hairstyle. Going by the success of the CMLparser so far it seems this theory holds up, as it contains an enum that maps out the cml's elements to an array index using this element id. As such, this is the list of element IDs and the sections they match up to:

Note: The IDs themselves have been changed to make them work as array indices. For example, ID 0x25 is treated as "5" as the IDs for the COLR tag elements start as 0x20. It's plausible that some bits in the first nybble are used as flags.

Code: Select all

DOC (starts at 0x70)
0 - Race
1 - Sex
2 - Unknown
3 - Unknown

FIGR (starts at 0x0)
0 - MainBody                   
1 - Arm                        
2 - Leg                        
3 - Bust                     
4 - Unknown1                  
5 - FaceShape                  
6 - EyeShape                  
7 - NoseHeight                  
8 - NoseShape                  
9 - Mouth                     
10 - Ears/Horns                  
11 - Neck                     
12 - Waist                     
13 - MainBody2                  
14 - Arm2                     
15 - Leg2                     
16 - Bust2                     
17 - Unknown2                  
18 - Neck2                     
19 - Waist2                     
20 - Unknown3                  
21 - Unknown4                  

COLR (starts at 0x20)
0 - Costume                  
1 - Main                     
2 - Sub1                     
3 - Skin/Sub2               
4 - LeftEye                  
5 - RightEye/Eyes            
6 - Hair                     
7 - Unknown                  
SLCT (starts at 0x40. Uses BCD)
0 - Costume/OuterWear/BodyPart   
1 - BodyPaint               
2 - Sticker                  
3 - RightEye/Eyes            
4 - EyeBrow                  
5 - EyeLash                  
6 - FaceType                  
7 - Unknown1                  
8 - MakeUp1                  
9 - HairStyle               
10 - Acc1                     
11 - Acc2                     
12 - Acc3                     
13 - MakeUp2                  
14 - LegPart                  
15 - ArmPart                  
16 - Acc4                     
17 - BaseWear                  
18 - InnerWear               
19 - BodyPaint2               

Bio Data

As this contains a small array of values, might as well share them here.

Possible race values:

Code: Select all
0 - Human
1 - Newman
2 - Cast
3 - Dewman

Possible sex values:

Code: Select all
0 - Male
1 - Female

Re: CML file structure

PostPosted: Mon Mar 28, 2016 1:28 pm
by dian333
Code: Select all
"vtc0" AD00 0000 "FIGR" 0000 0B00
00   4801 0000 0067 0100 0000 78 00 0000
01   4801 99FE FFFF 10FF FFFF 99 FE FFFF
02   4801 8CFB FFFF 5DFE FFFF 22 06 0000
03   4801 0>                                           0
04   4801 C0F9 FFFF 96F5 FFFF D6 FB FFFF
05   4801 800C 0000 C1F9 FFFF 56 FE FFFF
06   4801 D6FB FFFF EA16 0000 F0 D8 FFFF

--copy from "human_male_hunter_00.cml"
can you explain it? to
“FIGR's ints are vertex1Ypos, vertex2Xpos, vertex2Ypos. ”

Re: CML file structure

PostPosted: Mon Mar 28, 2016 2:25 pm
by Chikinface
Heres how I make it out:

The first element in human_male_hunter_00.cml is
00 - Index (as you've used)
48 01 - unknown. I ignore these bytes currently.
vertex1Ypos : 00 00 00 00
vertex2Xpos : 67 01 00 00
vertex2Ypos : 78 00 00 00

Code: Select all
Raw data:
00 48 01 00 00 00 00 67 01 00 00 78 00 00 00

These correspond to the vertices on the sliders pictured below, as I can't think of a better name to call them. (Sliders taken from Nagisa's cml)


Note that Vertex 1 can only move up and down in the Y axis while Vertex 2 can move in both X and Y. The other two vertices just mirror the other side. Admittedly I'm not 100% sure which is the real vertex and which is the mirrored one, but I'm not that worried currently.

Hope that helps clear that up.

Re: CML file structure

PostPosted: Wed Mar 30, 2016 3:50 am
by dian333
Code: Select all
00 4801 AB F7 FFFF

-copy from "npc_000.cml"
i use "PSO2 CC" EP4 version to test charactrer's height
result:"F7" relate to the vertex 1 postion.
else if change the value to 0x"00" :
vertex 1 postion return to default postion (WHITE POINT)
height = 171cm
else if change the value to 0x"D7":
vertex 1 postion = lowest postion # [SEGA LIMIT range HEIGHT OF PERSON] ]
height = 139cm
height is a liner-Function of the value (when the value is in range)
Maybe the value is a sign-8bit-int。

Re: CML file structure

PostPosted: Fri Apr 01, 2016 5:47 pm
by dian333
Code: Select all
Bodyの调整                                     00  48
 腕の调整  手腕长度 (整条手臂)     01 48
  Bust         胸                        03  48     
  颜Part配置の调整              05  48
  鼻のSize调整                         07  48
 口の调整                             09  48

Re: CML file structure

PostPosted: Tue Apr 05, 2016 3:04 pm
by Chikinface
What are your numbers referring to?

Best I can make from google is your list matches my figr data close enough, just not sure what your numbers are.

Re: CML file structure

PostPosted: Fri Apr 08, 2016 3:35 pm
by dian333
if you use jp version ,for example (pso2 CC 4).you can see these "jp or chinese words" when making character.
Matrix of size = {x_SCALE 0 0 0
0 y_SCALE 0 0
0 0 z_SACLE 0
0 0 0 1}
Matrix[0] .x Matrix[1] .x Matrix[2] .x Matrix[3] .x .....
finally get this character's bone scale.

copy and translate from "characreate.adv.lua" (utf-8)
about body's bone scale parameter

min 0.75
max 1.25
viriation 0.01
default 1
ヤセ・太り变形 (thin . too fat ?)
ゴツイ・耽美 (rough or 耽美)
man human bone scale data store in "pl_cmakemot_b_mh.aqm" ,change with Frame change..

Re: CML file structure

PostPosted: Sat Apr 09, 2016 9:35 pm
by Chikinface
Cool stuff! Might check some of this out after I'm done making something for a gamejam. c: One of the things that could be done with this kind of data is to essentially turn cmls into models that could be used anywhere.

The big problem is I'm pretty inexperienced with 3D data structures and the related mathematics, so it might not be possible for me to do it myself.

Re: CML file structure

PostPosted: Mon Apr 11, 2016 2:31 pm
by dian333
default famale body
quna (npc_32?33?)
face part ,w.i.p......................
have more news or not depend on complet_percent and convenience.

Re: CML file structure

PostPosted: Mon Apr 11, 2016 2:45 pm
by dian333
--"One of the things that could be done with this kind of data is to essentially turn cmls into models that could be used anywhere."
please don't say "anywhere.". i fear ...................