3D Face Modeling for Video Synthesis

Abstract

High quality, controllable models of the human head are an important part of countless applications in entertainment, education, telecommunication and many other fields. We present methods of modeling and reconstructing the human head using dynamic neural scene representation networks. The approaches can be trained solely on a short monocular input video, without a need for specialized capture setups or large amounts of data. To capture the dynamics of the face, we condition the scene representation networks on additional inputs such as expression or audio data. In our experiments, we demonstrate that the models can produce photo-realistic results and allow for explicit control of pose and expression. Additionally, we conduct an analysis of several variations of the core method, including the use of separate scene representation networks for the head and torso, in order to highlight their relative strengths and weaknesses.

Robin Rebggli

Bachelor's Thesis

Status:

Completed