Automatic objective metric for the optimization of nonverbal behavior generative models

Abstract

Evaluating the quality of generated nonverbal behavior remains a major challenge in the development of generative models. While human evaluations are reliable, they are costly and impractical for large-scale or iterative optimization. In this work, we propose an objective evaluation framework based on aggregated ranks across multiple fidelity and diversity metrics, computed from both raw features and learned latent representations. Compared to existing works, our framework emphasizes consistency across multiple metrics, aiming to provide a more holistic assessment.

Radar chart

Radar chart comparing the performance of four CWGAN-GP variants. Each polygon represents a model configuration defined by specific values of the loss parameters α and β. The legend indicates the final aggregated ranking for each configuration based on the composite score.

Generated nonverbal behaviors replayed on virtual agents

The videos present segments of nonverbal behaviors generated by one of the CWGAN-GP variant. The nonverbal behaviors are then replayed on a virtual character through the Retorik platform developed by Davi. The original audio from the video are preserved.

α = 0, β = 1

α = 0.01, β = 0.99

α = 0.1, β = 0.9

α = 0.5, β = 0.5

Real nonverbal behaviors extracted and replayed

This video presents a segment of nonverbal behaviors from a real-life recorded sequence. The nonverbal behaviors were extracted using the OpenFace tool. The original audio from the video is preserved. The extracted nonverbal behaviors are then replayed on a virtual character through the Retorik platform developed by Davi.

Automatic objective metric

For the optimization of nonverbal behavior generative models