Lecture 22

Faces

Thoughts

  • I saw Yaser Sheikh’s keynote presentation at HPG 2020, which covered most of Facebook’s recent work on metric telepresence. Their encoder architecture was pretty understandable, and surprisingly straightforward going from on-headset video feed to the encoded representation, and then decoding to model deformations and view-dependent texturing. I wonder how extensible it is, though, since it seemed to need training per-subject. It is quite cool how well we can infer the whole face geometry even when occluded by a VR headset.
  • The facebook reality group also worked on eye texturing and tracking, which seemed like an even harder problem - they do end up using a separate model for eyes specifically, since they must be independently tracked by the headset and need very detailed texture. The learning architecture was pretty similar to the original work, but introduced extra loss for gaze direction and decoded both face and eye meshes/textures.
  • Lastly, the neural voice paper gave reasonable results from pure audio, which was pretty impressive. However, the animation was still a bit lackluster - certainly not yet lip-readable, and doesn’t capture broad facial movement. I imagine it is very convenient to be able to create pretty good results with only an audio recording, and these approaches will improve in further papers.
Written on April 21, 2021