Blendshape(Morph Target动画)

Blendshapes泛指3D定点动画的制作方式 (Maya里面称之为 blend shapes ,而3DS Max里称之为morph targets) ,在3D动画中用的比较多,尤其是人脸动画的制作,通过blendshape来驱动角色的面部表情。

用在脸部动画制作时,blendshape可以被称之为脸部特征,表情基准,定位符等等。这里要引入一个FACS的概念,可以简单理解为将脸部进行合理化的分区标准。

“表情这个东西看起来是一个无限多可能的东西,怎么能够计算expression呢?

这就带来了Blendshapes——一组组成整体表情的基准(数量可以有十几个、50个、100+、 200+,越多就越细腻)。我们可以使用这一组基准通过线性组合来计算出整体的expression,用公式来说就是 ,其中e是expression,B是一组表情基准,d是对应的系数(在这一组里面的权重),b是neutral。”

— From https://zhuanlan.zhihu.com/p/78174706

BlendShape系数介绍

在ARKit中,对表情特征位置定义了52组运动blendshape系数(
https://developer.apple.com/documentation/arkit/arfaceanchor/blendshapelocation ),每个blendshape系数代表一种表情定位符,表情定位符定义了特定表情属性,如mouthSmileLeft、mouthSmileRight等,与其对应的blendshape系数则表示表情运动范围。这52组blendshape系数极其描述如下表所示。

Blendshape

每一个blendshape系数的取值范围为0~1的浮点数。以jawOpen为例,当认为用户的嘴巴完全闭紧时,返回的jawOpen系数为0。当认为用户的嘴巴张开至最大时,返回的jawOpen系数为1。

在用户完全闭嘴与嘴张到最大之间的过渡状态,jawOpen会根据用户嘴张大的幅度返回一个0~1的插值。

脸部动捕的使用

ARKit 脸部与Vive脸部blendshape基准对比

ARKit(52)ExtraVIVE(52)Extra
Brow50
Eye1314Eye Frown + 1
Cheek33
Nose20
Jaw44
Mouth2420O shape - 1
Tongue1Tongue + 711
Sum52595252

ARKit的52个Blendshape表情基准组

可以看ARKit Face Blendshapes的照片和3D模型示例:https://arkit-face-blendshapes.com/

CC3ARKit Name 表情基准/定位符ARKit PictureCC3 Picture
A01browInnerUp
A02browDownLeft
A03browDownRight
A04browOuterUpLeft
A05browOuterUpRight
A06eyeLookUpLeft
A07eyeLookUpRight
A08eyeLookDownLeft
A09eyeLookDownRight
A10eyeLookOutLeft
A11eyeLookInLeft
A12eyeLookInRight
A13eyeLookOutRight
A14eyeBlinkLeft
A15eyeBlinkRight
A16eyeSquintLeft
A17eyeSquintRight
A18eyeWideLeft
A19eyeWideRight
A20cheekPuff
A21cheekSquintLeft
A22cheekSquintRight
A23noseSneerLeft
A24noseSneerRight
A25jawOpen
A26jawForward
A27jawLeft
A28jawRight
A29mouthFunnel
A30mouthPucker
A31mouthLeft
A32mouthRight
A33mouthRollUpper
A34mouthRollLower
A35mouthShrugUpper
A36mouthShrugLower
A37mouthClose
A38mouthSmileLeft
A39mouthSmileRight
A40mouthFrownLeft
A41mouthFrownRight
A42mouthDimpleLeft
A43mouthDimpleRight
A44mouthUpperUpLeft
A45mouthUpperUpRight
A46mouthLowerDownLeft
A47mouthLowerDownRight
A48mouthPressLeft
A49mouthPressRight
A50mouthStretchLeft
A51mouthStretchRight
A52tongueOut
  • CC3 额外的舌头Blendshape(with open month):
T01Tongue_Up
T02Tongue_Down
T03Tongue_Left
T04Tongue_Right
T05Tongue_Roll
T06Tongue_Tip_Up
T07Tongue_Tip_Down

Vive面部的表情基准组

Vive这一套脸部追踪也是52个blendshapes,但是和苹果的基准有很大区别。

  • 区别一:舌头

苹果其实是52+7,因为舌头在52个里只有一个伸舌头的blendshape,但vive其实是42 + 10,整体来讲Vive表情记住能tracking到的表情细节还是更少一些。

  • 区别二:眉毛

ARKit的52个blendshapes,是根据硬件分区一对一tracking的,然而Vive眉毛不分是没有单独另设blendshapes,而是与眼睛的动作blended在一起作为一个blendshape的,并不是精准的一对一分区tracking。

我下面编号的排序是按照VIVE Eye and Facial Tracking SDK unity 里inspector里的顺序,方便我加表情。

这里是整理的用ARKit制作Vive基准的对应编号:

https://docs.google.com/spreadsheets/d/1kWXnqtiVbXRb1FrD5NLlxxuxbYmS0Z6YBLuIE1WwqD4/edit?usp=sharing

  • Eye Blendshapes (14 = 12 + 2)
Vive编号Vive表情基准Vive PictureCreate by CC3 blendshapes
V01Eye_Left_Blink
V02Eye_Left_Wide
V03Eye_Left_Right
V04Eye_Left_Left
V05Eye_Left_Up
V06Eye_Left_Down
V07Eye_Right_Blink
V08Eye_Right_Wide
V09Eye_Right_Right
V10Eye_Right_Left
V11Eye_Right_Up
V12Eye_Right_Down
V13Eye_Left_squeeze: The blendShape close eye tightly when Eye_Left_Blink value is 100.
V14Eye_Right_squeeze
  • Lip Blendshapes (38 = 37 + 1)
Vive编号Vive表情基准Vive PictureCreate by CC3 blendshapes
V15Jaw_Right
V16Jaw_Left
V17Jaw_Forward
V18Jaw_Open
V19Mouth_Ape_Shape
V20Mouth_Upper_Right
V21Mouth_Upper_Left
V22Mouth_Lower_Right
V23Mouth_Lower_Left
V24*Mouth_Upper_Overturn
V25*Mouth_Lower_Overturn
V26Mouth_Pout
V27Mouth_Smile_Right
V28Mouth_Smile_Left
V29Mouth_Sad_Right
V30Mouth_Sad_Left
V31Cheek_Puff_Right
V32Cheek_Puff_Left
V33Cheek_Suck
V34Mouth_Upper_UpRight
V35MouthUpper UpLeft
V36Mouth_Lower_DownRight
V37Mouth_Lower_DownLeft
V38Mouth_Upper_Inside
V39Mouth_Lower_Inside
V40Mouth_Lower_Overlay
V41Tongue_LongStep1
V42Tongue_LongStep2
V43*Tongue_Down
V44*Tongue_Up
V45*Tongue_Right
V46*Tongue_Left
V47*Tongue_Roll
V48*Tongue_UpLeft_Morph
V49*Tongue_UpRight_Morph
V50*Tongue_DownLeft_Morph
V51*Tongue_DownRight_Morph
V52*O-shaped mouth

MediaPipe提取BlendShape

MediaPipe Face Landmarker解决方案最初于5月的Google I/O 2023发布。它可以检测面部landmark并输出blendshape score,以渲染与用户匹配的3D面部模型。通过MediaPipe Face Landmarker解决方案,KDDI和谷歌成功地为虚拟主播带来了真实感。

技术实现

使用Mediapipe强大而高效的Python包,KDDI开发人员能够检测表演者的面部特征并实时提取52个混合形状。

还可参考:https://colab.research.google.com/github/googlesamples/mediapipe/blob/main/examples/face_landmarker/python/%5BMediaPipe_Python_Tasks%5D_Face_Landmarker.ipynb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import mediapipe as mp
from mediapipe.tasks import python as mp_python
import time

MP_TASK_FILE = "face_landmarker_with_blendshapes.task"

class FaceMeshDetector:
def __init__(self):
with open(MP_TASK_FILE, mode="rb") as f:
f_buffer = f.read()

# 创建配置选项
base_options = mp_python.BaseOptions(model_asset_buffer=f_buffer)
options = mp_python.vision.FaceLandmarkerOptions(
base_options=base_options,
output_face_blendshapes=True,
output_facial_transformation_matrixes=True,
running_mode=mp.tasks.vision.RunningMode.LIVE_STREAM,
num_faces=1,
result_callback=self.mp_callback
)

# 创建模型
self.model = mp_python.vision.FaceLandmarker.create_from_options(options)
self.landmarks = None
self.blendshapes = None
self.latest_time_ms = 0

def mp_callback(self, mp_result, output_image, timestamp_ms: int):
# 处理回调结果
if len(mp_result.face_landmarks) >= 1 and len(mp_result.face_blendshapes) >= 1:
self.landmarks = mp_result.face_landmarks[0]
self.blendshapes = [b.score for b in mp_result.face_blendshapes[0]]

def update(self, frame):
t_ms = int(time.time() * 1000)
if t_ms <= self.latest_time_ms:
return

frame_mp = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)
self.model.detect_async(frame_mp, t_ms)
self.latest_time_ms = t_ms

def get_results(self):
return self.landmarks, self.blendshapes

参考