-
Notifications
You must be signed in to change notification settings - Fork 3
Description
The skeleton bone hierarchy is the most important part of a character/avatar standard. The Reference Canonical Skeleton Framework proposal already covers this - great! However, there is more than just skeleton bones that could be defined.
I noticed that the CSV document defines a bunch of face bones, such as nose, chin, cheek, mouth, lip, brow, etc: https://github.com/meshula/LabRCSF/blob/b99b7742d2802679b88f4081ce5eda44a9cef57a/joints.csv
I'm not against having such bones defined, especially if it allows OpenUSD bones to be consistently transcribed to/from RCSF and glTF. However, from what I have seen, facial animation is far more common to do with blend shapes, also known as shape keys in Blender terminology, also known as morph targets in glTF terminology.
In my specification, G4MF, I have attempted to define standard names for these. Character/avatar blend shapes can be divided into 2 groups:
- Visemes, short for "visual phonemes", are blend shapes that represent the visual aspect of phonemes during human speech. They are used for lip syncing, the process of matching a character's mouth movements to spoken words.
- I have a proposed set of names here, as a superset of what platforms provide: G4MF Viseme Blend Shape Naming.
- These names are basically the same as other platforms, except with a consistent "Viseme" prefix on all of them. Some use "E" instead of "EE", VRM uses single letters only, but I went with always 2 letters for consistency and clarity.
- As far as I can tell, my definition is the only one to actually include IPA symbols. Other places use English letters only, plus pictures sometimes, but are ambiguous on the sounds, especially for a non-English speaker.
- I have a proposed set of names here, as a superset of what platforms provide: G4MF Viseme Blend Shape Naming.
- Face tracking is the process of using a camera to track a person's facial movements, and applying those movements to a character/avatar's face. While visemes are used for lip syncing to audio, face tracking is used to mimic a much broader range of facial expressions.
- I have a proposed set of names here, as a superset of what platforms provide: G4MF Face Tracking Blend Shape Naming.
- These are basically the same as "Unified Expressions" from VRCFaceTracking, which already did the work of unifying several face tracking standards together, except that I prefixed all the names with "Face".
- I am far from an expert on face tracking, so I'm basically just blindly trusting that VRCFT did a good job. I'd say that more research is required before declaring a standard. Anyway, see their docs for their much more detailed comprehensive documentation.
- I have a proposed set of names here, as a superset of what platforms provide: G4MF Face Tracking Blend Shape Naming.
- Aside from those 2 groups, there are plenty of models out there with blend shapes like "Happy", "Sad", "Angry", etc. But for these types of things, any level of standardization feels inappropriate, so I propose we decide to ignore these.
With these in place, it is possible that we don't have a need for the facial bones, but again I'm not necessarily proposing that they be removed if others have a genuine use for them. Regardless, defining blend shapes is vital for enabling interoperability for characters/avatars that have them. It is possible that these are out-of-scope for the RCSF repo, but it should still be considered, and it still has to be defined somewhere.