Recent advances in learning reusable motion priors have demonstrated their effectiveness in generating naturalistic behaviors. In this paper, we propose a new learning framework in this paradigm for controlling physics-based characters with significantly improved motion quality and diversity over existing state-of-the-art methods. The proposed method uses reinforcement learning (RL) to initially track and imitate life-like movements from unstructured motion clips using the discrete information bottleneck, as adopted in the Vector Quantized Variational AutoEncoder (VQ-VAE). This structure compresses the most relevant information from the motion clips into a compact yet informative latent space, i.e., a discrete space over vector quantized codes. By sampling codes in the space from a trained categorical prior distribution, high-quality life-like behaviors can be generated, similar to the usage of VQ-VAE in computer vision. Although this prior distribution can be trained with the supervision of the encoder's output, it follows the original motion clip distribution in the dataset and could lead to imbalanced behaviors in our setting. To address the issue, we further propose a technique named prior shifting to adjust the prior distribution using curiosity-driven RL. The outcome distribution is demonstrated to offer sufficient behavioral diversity and significantly facilitates upper-level policy learning for downstream tasks. We conduct comprehensive experiments using humanoid characters on two challenging downstream tasks, sword-shield striking and two-player boxing game. Our results demonstrate that the proposed framework is capable of controlling the character to perform considerably high-quality movements in terms of behavioral strategies, diversity, and realism.
To control the physics-based character perform lifelike movement, we firstly train an imitation policy to mimic a set of motion clips. Our framework employs a conditional VQ-VAE structure comprising an encoder-decoder and a discrete information bottleneck. This structure enables training the policy to perform a wide range of diverse movements, while keeping the resulting latent space compact, i.e., a codebook. This allows for efficient reuse of the learned movements in downstream tasks. To demonstrate the generality of our framework, we evaluated it on two different humanoid characters, one equipped with a sword and a shield with 37 degrees-of-freedom and another character is equipped with boxing gloves with 34 degrees-of-freedom. The imitation policy is trained separately for each character using motion clips that last about half an hour, including basic movements such as locomotion, as well as specific movements such as a set of attack combos for the actor with a sword and shield, and some boxing skills for the boxer character.
When reusing the pre-trained representation in downstream tasks, existing VAE-based methods often discard the trained encoder in the imitation policy, and directly create a upper-level policy to explore the latent space to drive the fixed decoder. In fact, the encoder contains valuable information from the original data distribution. Here, we use the encoder to train a prior network that can generate random rollouts based on the learned movements. Initially, we use policy distillation to train a categorical prior network to fit the categorical distribution of the encoder given only proprioceptive observation. However, due to the imbalanced motion data, sampling codes from the trained prior distribution naturally drive the decoder to perform movements that appeared more frequently in the motion data. To facilitate the exploration in downstream task learning with unknown skill preference, it is desired to have a balanced prior capable of performing a diverse range of movements with nearly uniform probability over distinct motions in the dataset. To accomplish this, our framework utilizes a count-based reinforcement learning approach to fine-tune the prior distribution.
Finally, we leverage the pre-trained decoder and shifted prior distribution, which have acquired sufficient knowledge on both movement quality and diversity, to effectively solve downstream tasks by training an upper-level policy. The upper-level policy takes a goal observation as input and produces a categorical distribution over codes that guide the decoder to achieve downstream tasks. Additionally, we use a KL-regularized term to constrain the upper-level policy to stay close to the pre-trained prior distribution, which encourages natural and diverse movements. To evaluate the effectiveness of our approach, we select one downstream task for each character: a strike task for the character with sword and shield, and a two-player boxing game for the boxer character. Our experiments demonstrate that the upper-level policy can be trained more efficiently and surprisingly perform attacking and defensing strategies compared to real two-player boxing sports.
@article{
zhu2023NCP,
author = {Zhu, Qingxu and Zhang, He and Lan, Mengting and Han, Lei},
title = {Neural Categorical Priors for Physics-Based Character Control},
year = {2023},
issue_date = {December 2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {42},
number = {6},
issn = {0730-0301},
url = {https://doi.org/10.1145/3618397},
doi = {10.1145/3618397},
journal = {ACM Trans. Graph.},
month = {dec},
articleno = {178},
numpages = {16},
keywords = {reinforcement learning, VQ-VAE, multi-agent, generative model, character animation}
}