Nvidia AI researchers have introduced AI for generating talking heads for video conferences from a single 2D image capable of achieving a wide range of manipulation, from rotating and moving a person’s head to motion transfer and video reconstruction. The AI uses the first frame in a video as a 2D photo then uses an unsupervised learning method to gather 3D keypoints within a video. In addition to outperforming other approaches in tests using benchmark datasets, the AI achieves H.264 quality video using one-tenth of the bandwidth that was previously required.
Nvidia research scientists Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu published a paper about the model Monday on preprint repository arXiv. Results show the latest AI model outperforms vid2vid, a few-shot GAN detailed in a paper published at NeurIPS last year of which Wang was lead author and Liu was a coauthor.
“By modifying the keypoint transformation only, we are able to generate free-view videos. By transmitting just the keypoint transformations, we can achieve much better compression ratios than existing methods,” the paper reads. “By dramatically reducing the bandwidth and ensuring a more immersive experience, we believe this is an important step towards the future of video conferencing.”
The release of the model follows the debut in October of Maxine, an Nvidia video conferencing service. In addition to offering virtual backgrounds like Zoom does, Maxine will deliver subtle AI-powered features like face alignment and noise reduction with less conspicuous features like a conversational AI avatar or live translation.
Video calls for Microsoft Teams and Zoom also use forms of AI to do things like blur backgrounds and power augmented reality animation and effects. A paper about the Nvidia AI release was published a day before Salesforce acquired Slack for $27 billion, news that could shake up the enterprise communications landscape and fuel the feud between Microsoft Teams and Slack. Microsoft also introduced an update to the Teams calling experience today.
Nvidia is one of the best-known companies in the world working on generative adversarial (GANs) models like StyleGan that have the ability to distort reality and blur the lines between what’s real and what’s fake. Such AI models have potential applications for entertainment and gaming, but also for disinformation or creating fake accounts. While there was much concern — thankfully not fulfilled — about the possibility of deepfakes accelerating misinformation leading up to the U.S. presidential election in November, GANs did enter the picture. In one instance, this fall Russian state actors used fake profile images generated using GANs as part of an effort to create a fake news outlet staffed by actual Russian writers for propelling propaganda. In another incident in 2019, AI-generated images were used to make a profile for Katie Jones, a fake person with an AI-generated photo who reached out to Washington D.C. political influencers and think tank researchers.