超赞的数字人生成知识库 Awesome-Talking-Head-Synthesis


Gihub:https://github.com/Kedreamix/Awesome-Talking-Head-Synthesis

这份资源库整理了与生成对抗网络(GAN)和神经辐射场(NeRF)相关的论文、代码和资源,重点关注基于图像和音频的虚拟讲话头合成论文及已发布代码。

论文合集及发布代码整理。✍️

大多数论文链接到“arXiv”或学术会议/期刊的PDF。但是,一些论文可能需要学术许可才能查看。

这个Awesome Talking Head Synthesis项目将持续更新 - 欢迎Pull Request。如果您有任何论文缺失、新增论文、关键研究人员或错别字建议,请编辑提交PR。您也可以打开Issue或直接通过电子邮件联系我。

如果您觉得这个仓库有用,请star⭐支持!

2023年12月更新 📆

感谢https://github.com/Curated-Awesome-Lists/awesome-ai-talking-heads, 我增加了一些其内容,例如Tools&Software和Slides&Presentations模块。 希望这对您有帮助。

如果您对扩展这个聚合资源有任何想法或反馈,请打开Issue或PR——社区贡献对推进我们共同的知识至关重要。

让我们继续努力,实现更逼真的数字人脸表现!我们已经走了很长一段路,但还有很长的路要走。通过持续的研究和合作,我相信我们一定会达到目标!

如果您觉得这个仓库很有价值,请star⭐并分享给他人。您的支持可以激励我持续改进和维护它。如果您还有任何其他问题,请告诉我!

This repository organizes papers, codes and resources related to generative adversarial networks (GANs) 🤗 and neural radiance fields (NeRF) 🎨, with a main focus on image-driven and audio-driven talking head synthesis papers and released codes. 👤

Papers for Talking Head Synthesis, released codes collections. ✍️

Most papers are linked to PDFs on “arXiv” or journal/conference websites 📚. However, some papers require an academic license to view 🔐.

🔆 This project Awesome-Talking-Head-Synthesis is ongoing - pull requests are welcome! If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and submit a PR. You can also open an issue or contact me directly via email. 📩

⭐ If you find this repo useful, please give it a star! 🤩

2023.12 Update 📆

Thank you to https://github.com/Curated-Awesome-Lists/awesome-ai-talking-heads, I have added some of its contents, such as Tools & Software and Slides & Presentations. 🙏 I hope this will be helpful.😊

If you have any feedback or ideas on extending this aggregated resource, please open an issue or PR - community contributions are vital to advancing this shared knowledge. 🤝

Let’s keep pushing forward to recreate ever more realistic digital human faces! 💪 We’ve come so far but still have a long way to go. With continued research 🔬 and collaboration, I’m sure we’ll get there! 🤗

Please feel free to star ⭐ and share this repo if you find it a valuable resource. Your support helps motivate me to keep maintaining and improving it. 🥰 Let me know if you have any other questions!

Datasets

在这里插入图片描述

DatasetDownload LinkDescription
Faceforensics++Download link
CelebVDownload link
VoxCelebDownload linkVoxCeleb, a comprehensive audio-visual dataset for speaker recognition, encompasses both VoxCeleb1 and VoxCeleb2 datasets.
VoxCeleb1Download linkVoxCeleb1 contains over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube.
VoxCeleb2Download linkExtracted from YouTube videos, VoxCeleb2 includes video URLs and discourse timestamps. As the largest public audio-visual dataset, it is primarily used for speaker recognition tasks. However, it can also be utilized for training talking-head generation models. To obtain download permission and access the dataset, apply here. Requires 300 GB+ storage space.
ObamaSetDownload linkObamaSet is a specialized audio-visual dataset focused on analyzing the visual speech of former US President Barack Obama. All video samples are collected from his weekly address footage. Unlike previous datasets, it exclusively centers on Barack Obama and does not provide any human annotations.
TalkingHead-1KHDownload linkThe dataset consists of 500k video clips, of which about 80k are greater than 512x512 resolution. Only videos under permissive licenses are included. Note that the number of videos differ from that in the original paper because a more robust preprocessing script was used to split the videos.
LRW (Lip Reading in the Wild)Download linkLRW, a diverse English-speaking video dataset from the BBC program, features over 1000 speakers with various speaking styles and head poses. Each video is 1.16 seconds long (29 frames) and involves the target word along with context.
MEAD 2020Download linkMEAD 2020 is a Talking Head dataset annotated with emotion labels and intensity labels. The dataset focuses on facial generation for natural emotional speech, covering eight different emotions on three intensity levels.
CelebV-HQDownload linkCelebV-HQ is a high-quality video dataset comprising 35,666 clips with a resolution of at least 512x512. It includes 15,653 identities, and each clip is manually labeled with 83 facial attributes, spanning appearance, action, and emotion. The dataset’s diversity and temporal coherence make it a valuable resource for tasks like unconditional video generation and video facial attribute editing.
HDTFDownload linkHDTF, the High-definition Talking-Face Dataset, is a large in-the-wild high-resolution audio-visual dataset consisting of approximately 362 different videos totaling 15.8 hours. Original video resolutions are 720 P or 1080 P, and each cropped video is resized to 512 × 512.
CREMA-DDownload linkCREMA-D is a diverse dataset with 7,442 original clips featuring 91 actors, including 48 male and 43 female actors aged 20 to 74, representing various races and ethnicities. The dataset includes recordings of actors speaking from a set of 12 sentences, expressing six different emotions (Anger, Disgust, Fear, Happy, Neutral, and Sad) at four emotion levels (Low, Medium, High, and Unspecified). Emotion and intensity ratings were gathered through crowd-sourcing, with 2,443 participants rating 90 unique clips each (30 audio, 30 visual, and 30 audio-visual). Over 95% of the clips have more than 7 ratings. For additional details on CREMA-D, refer to the paper link.
LRS2Download linkLRS2 is a lip reading dataset that includes videos recorded in diverse settings, suitable for studying lip reading and visual speech recognition.
GRIDDownload linkThe GRID dataset was recorded in a laboratory setting with 34 volunteers, each speaking 1000 phrases, totaling 34,000 utterance instances. Phrases follow specific rules, with six words randomly selected from six categories: “command,” “color,” “preposition,” “letter,” “number,” and “adverb.” Access the dataset here.
SAVEEDownload linkThe SAVEE (Surrey Audio-Visual Expressed Emotion) database is a crucial component for developing an automatic emotion recognition system. It features recordings from 4 male actors expressing 7 different emotions, totaling 480 British English utterances. These sentences, selected from the standard TIMIT corpus, are phonetically balanced for each emotion. Recorded in a high-quality visual media lab, the data undergoes processing and labeling. Performance evaluation involves 10 subjects rating recordings under audio, visual, and audio-visual conditions. Classification systems for each modality achieve speaker-independent recognition rates of 61%, 65%, and 84% for audio, visual, and audio-visual, respectively.
BIWI(3D)Download linkThe Biwi 3D Audiovisual Corpus of Affective Communication serves as a compromise between data authenticity and quality, acquired at ETHZ in collaboration with SYNVO GmbH.
VOCADownload linkVOCA is a 4D-face dataset with approximately 29 minutes of 4D face scans and synchronized audio from 12-bit speakers. It greatly facilitates research in 3D VSG.
Multiface(3D)Download linkThe Multiface Dataset consists of high-quality multi-view video recordings of 13 people displaying various facial expressions. It contains approximately 12,200 to 23,000 frames per subject, captured at 30 fps from around 40 to 160 camera views with uniform lighting. The dataset’s size is 65TB and includes raw images (2048x1334 resolution), tracked and meshed heads, 1024x1024 unwrapped face textures, camera calibration metadata, and audio. This repository provides code for downloading the dataset and building a codec avatar using a deep appearance model.
MMFace4DDownload linkThe MMFace4D dataset is a large-scale multi-modal dataset for audio-driven 3D facial animation research. It contains over 35,000 sequences captured from 431 subjects ranging in age from 15 to 68 years old. Various sentences from scenarios such as news broadcasting, conversations and storytelling were recorded, totaling around 11,000 utterances. High-fidelity data was captured using three synchronized RGB-D cameras to obtain high-resolution 3D meshes and textures. A reconstruction pipeline was developed to fuse the multi-view data and generate topology-consistent 3D mesh sequences. In addition to the 3D facial motions, synchronized speech audio is also provided. The final dataset covers a wide range of expressive talking styles and facial expressions through a diverse set of subjects and utterances. With its large scale, high quality of data and strong diversity, the MMFace4D dataset provides an ideal benchmark for developing and evaluating audio-driven 3D facial animation models.

Survey

YearTitleConference/Journal
2024A Survey on 3D Gaussian Splatting 3DGS🔥🔥🔥on goingarXiv 2024
2024Neural Radiance Fields: Past, Present, and Future NeRF🔥🔥🔥 Amazing 413 pagesarXiv 2024
2023From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and ApplicationsarXiv 2023
2023Human-Computer Interaction System: A Survey of Talking-Head GenerationIEEE
2023Talking human face generation: A surveyACM
2022Deep Learning for Visual Speech Analysis: A SurveyarXiv 2022
2020What comprises a good talking-head video generation?: A Survey and BenchmarkarXiv 2020

Funny Work

YearTitleCodeProjectKeywords
2024[Audio2Photoreal] From Audio to Photoreal Embodiment: Synthesizing Humans in ConversationsCodeProjectPhotoreal
2024[Animate Anyone] Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character AnimationCodeProject🔥Animate (阿里科目三驱动)
2024[3DGAN] What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANsProject🔥Nvidia

Audio-driven

YearTitleConference/JournalCodeProjectKeywords
2024[Real3D-Portrait] Real3D-Portrait: One-shot Realistic 3D Talking Portrait SynthesisICLR 2024Project3D, One-Shot,Realistic
2024[AdaMesh] AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial AnimationArix 2024CodeProject3D,Mesh
2024[DREAM-Talk] DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face GenerationArix 2024ProjectEmotion
2024[AE-NeRF] AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head SynthesisAAAI 2024
2024[VectorTalker] VectorTalker: SVG Talking Face Generation with Progressive VectorisationArix 2024SVG
2024[VectorTalker] VectorTalker: SVG Talking Face Generation with Progressive VectorisationArix 2024
2024[Mimic] Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial AnimationAAAI 20243D
2024[DreamTalk] DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic ModelsArix 2024CodeProjectDiffusion
2024[FaceTalk] FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head ModelsArix 2024CodeProject
2024[GSmoothFace] GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face GuidanceArix 20243D
2024[GMTalker] GMTalker: Gaussian Mixture based Emotional talking video PortraitsArix 2024ProjectEmotion
2024[VividTalk] VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid PriorArix 2024Mesh
2024[GAIA] GAIA: Zero-shot Talking Avatar GenerationArix 2024Code(coming)Project😲😲😲
2023Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video GenerationICCV 2023CodeProject-
2023[ToonTalker] ToonTalker: Cross-Domain Face ReenactmentICCV 2023---
2023Efficient Emotional Adaptation for Audio-Driven Talking-Head GenerationICCV 2023CodeProject-
2023[EMMN] EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face GenerationICCV 2023--Emotion
2023Emotional Listener Portrait: Realistic Listener Motion Simulation in ConversationICCV 2023--Emotion,LHG
2023[MODA] MODA: Mapping-Once Audio-driven Portrait Animation with Dual AttentionsICCV 2023---
2023[Facediffuser] Facediffuser: Speech-driven 3d facial animation synthesis using diffusionACM SIGGRAPH MIG 2023CodeProject🔥Diffusion,3D
2023Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric SynthesisTCSVT 2023--
2023[SadTalker] SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face AnimationCVPR 2023CodeProject3D,Single Image
2023[EmoTalk] EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face AnimationICCV 2023Code3D,Emotion
2023Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented NetworksInterSpeech 2023Emotion
2023[DINet] DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution VideoAAAI 2023Code-
2023[StyleTalk] StyleTalk: One-shot Talking Head Generation with Controllable Speaking StylesAAAI 2023Code-Style
2023High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space LearningCVPR 2023--Emotion
2023[StyleSync] StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based GeneratorCVPR 2023CodeProject-
2023[TalkLip] TalkLip: Seeing What You Said - Talking Face Generation Guided by a Lip Reading ExpertCVPR 2023Code--
2023[CodeTalker] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion PriorCVPR 2023CodeProject3D,codebook
2023[EmoGen] Emotionally Enhanced Talking Face GenerationArxiv 2023Code-Emotion
2023[DAE-Talker] DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion AutoencoderArxiv 2023-Project🔥Diffusion
2023[READ] [READ Avatars: Realistic Emotion-controllable Audio Driven Avatars](READ Avatars: Realistic Emotion-controllable Audio Driven Avatars)Arxiv 2023---
2023[DiffTalk] DiffTalk: Crafting Diffusion Models for Generalized Talking Head SynthesisCVPR 2023CodeProject🔥Diffusion
2023[Diffused Heads] Diffused Heads: Diffusion Models Beat GANs on Talking-Face GenerationArxiv 2023-Project🔥Diffusion
2022[MemFace] Expressive Talking Head Generation with Granular Audio-Visual ControlCVPR 2022---
2022Talking Face Generation with Multilingual TTSCVPR 2022Demo Track--
2022[EAMM] EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion ModelSIGGRAPH 2022--Emotion
2022[SPACEx] SPACEx 🚀: Speech-driven Portrait Animation with Controllable ExpressionarXiv 2022-Project-
2022[AV-CAT] Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in TransformersSIGGRAPH Asia 2022---
2022[MemFace] Memories are One-to-Many Mapping Alleviators in Talking Face GenerationarXiv 2022---
2021[PC-AVS] PC-AVS: Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual RepresentationCVPR 2021CodeProject-
2021[IATS] Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face SynthesisACM MM 2021---
2021[Speech2Talking-Face] Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual RepresentationIJCAI 2021---
2021[FAU] Talking Head Generation with Audio and Speech Related Facial Action UnitsBMVC 2021--AU
2021[EVP] Audio-Driven Emotional Video PortraitsCVPR 2021Code-Emotion
2021[IATS] IATS: Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face SynthesisACM Multimedia 2021---
2020[Wav2Lip] A Lip Sync Expert Is All You Need for Speech to Lip Generation In The WildACM Multimedia 2020CodeProject-
2020[RhythmicHead] Talking-head Generation with Rhythmic Head MotionECCV 2020Code--
2020[MakeItTalk] Speaker-Aware Talking-Head AnimationSIGGRAPH Asia 2020CodeProject-
2020[Neural Voice Puppetry] Audio-driven Facial ReenactmentECCV 2020-Project-
2020[MEAD] A Large-scale Audio-visual Dataset for Emotional Talking-face GenerationECCV 2020CodeProject-
2020Realistic Speech-Driven Facial Animation with GANsIJCV 2020---
2019[DAVS] Talking Face Generation by Adversarially Disentangled Audio-Visual RepresentationAAAI 2019Code--
2019[ATVGnet] Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise LossCVPR 2019Code--
2018Lip Movements Generation at a GlanceECCV 2018Code--
2018[VisemeNet] Audio-Driven Animator-Centric Speech AnimationSIGGRAPH 2018---
2017[Synthesizing Obama] Learning Lip Sync From AudioSIGGRAPH 2017-Project-
2017[You Said That?] Synthesising Talking Faces From AudioBMVC 2019Code--
2017Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and EmotionSIGGRAPH 2017---
2017A Deep Learning Approach for Generalized Speech AnimationSIGGRAPH 2017---
2016[LRW] Lip Reading in the WildACCV 2016---

Text-driven

YearTitleConference/JournalCode/Proj
2023TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking StylesArxiv
2021Write-a-speaker: Text-based Emotional and Rhythmic Talking-head GenerationAAAICode
2021Txt2vid: Ultra-low bitrate compression of talking-head videos via textArxivCode

NeRF & 3D & Gaussian Splatting

YearTitleConference/JournalCodeProjectKeywords
2024[UltrAvatar] UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided TexturesArxiv 2024ProjectDiffusion,Avatar
2024[GaussianBody] GaussianBody: Clothed Human Reconstruction via 3d Gaussian SplattingArxiv 2024🔥Gaussian Splatting
2024FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRFCode4D face video editor
2024[AGG] AGG: Amortized Generative 3D Gaussians for Single Image to 3DArxiv 2024Project🔥Gaussian Splatting
2024Gaussian Shadow Casting for Neural CharactersArxiv 2024🔥Gaussian Splatting
2024[Human101] Human101: Training 100+FPS Human Gaussians in 100s from 1 ViewArxiv 2024Project🔥Gaussian Splatting
2024Deformable 3D Gaussian Splatting for Animatable Human AvatarsArxiv 2024🔥Gaussian Splatting
2024[4DGen] 4DGen: Grounded 4D Content Generation with Spatial-temporal ConsistencyArxiv 2024Project🔥Gaussian Splatting
2024[3DGAN] What You See Is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANsArxiv 2024Project
2024[3DGS-Avatar] 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian SplattingArxiv 2024CodeProject🔥Gaussian Splatting
2024Learning Dense Correspondence for NeRF-Based Face ReenactmentAAAI 2024one-shot multi-view face reenactmen
2024[R2-Talker] R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer ConditioningArxiv 2024based-RAD-NeRF
2024[GaussianHead] GaussianHead: Impressive 3D Gaussian-based Head Avatars with Dynamic Hybrid Neural FieldArxiv 2024Code🔥Gaussian Splatting
2024[MonoGaussianAvatar] MonoGaussianAvatar: Monocular Gaussian Point-based Head AvatarArxiv 2024🔥Gaussian Splatting
2024[Gaussian Head Avatar] Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic GaussiansArxiv 2024CodeProject
2024[HeadGaS] HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian SplattingArxiv 2024🔥Gaussian Splatting
2024[GaussianAvatars] GaussianAvatars: Photorealistic Head Avatars with Rigged 3D GaussiansArxiv 2024Project🔥Gaussian Splatting
2024[SyncTalk] SyncTalk: The Devil😈 is in the Synchronization for Talking Head SynthesisCVPR 2024?CodeProject😈
2024[R2-Talk] R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer ConditioningArxiv 2024
2024[DT-NeRF] DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait SynthesisICASSP 2024--ER-NeRF
2023[ER-NeRF] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait SynthesisICCV 2023CodeProjectTri-plane
2023[LipNeRF] LipNeRF: What is the right feature space to lip-sync a NeRF?FG 2023CodeProjectWav2lip
2023[SD-NeRF] SD-NeRF: Towards Lifelike Talking Head Animation via Spatially-adaptive Dual-driven NeRFsIEEE 2023--
2023[Instruct-NeuralTalker] Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with InstructionsArxiv 2023
2023[GeneFace++] Generalized and Stable Real-Time Audio-Driven 3D Talking Face GenerationArxiv 2023-Project-
2023[GeneFace] GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face SynthesisICLR 2023CodeProject-
2022[RAD-NeRF] RAD-NeRF: Real-time Neural Talking Portrait SynthesisArxiv 2022CodeProjectInstantNGP
2022[DFRF] DFRF:Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head SynthesisECCV 2022CodeProject
2022[DialogueNeRF] DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video GenerationArxiv 2022---
2022[NeRFInvertor] NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image AnimationArxiv 2022CodeProject-
2022[Next3D] Next3D: Generative Neural Texture Rasterization for 3D-Aware Head AvatarsArxiv 2022CodeProject-
2022[3DFaceShop] 3DFaceShop: Explicitly Controllable 3D-Aware Portrait GenerationArxiv 2022CodeProject-
2022[FNeVR] FNeVR: Neural Volume Rendering for Face AnimationArxiv 2022Code--
2022[ROME] ROME: Realistic One-shot Mesh-based Head AvatarsECCV 2022CodeProject-
2022[IMavatar] IMavatar: Implicit Morphable Head Avatars from VideosCVPR 2022CodeProject-
2022[HeadNeRF] HeadNeRF: A Real-time NeRF-based Parametric Head ModelCVPR 2022CodeProject-
2022[SSP-NeRF] Semantic-Aware Implicit Neural Audio-Driven Video Portrait GenerationArxiv 2022CodeProject-
2021[AD-NeRF] AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head SynthesisICCV 2021CodeProject-
2021[NerFACE] NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar ReconstructionCVPR 2021 OralCodeProject-
2021[DFA-NeRF] DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural RenderingArxiv 2021Code--

Metrics

MetricsPaperLink
PSNR (peak signal-to-noise ratio)-
SSIM (structural similarity index measure)Image quality assessment: from error visibility to structural similarity.
CPBD(cumulative probability of blur detection)A no-reference image blur metric based on the cumulative probability of blur detection
LPIPS (Learned Perceptual Image Patch Similarity) -The Unreasonable Effectiveness of Deep Features as a Perceptual Metricpaper
NIQE (Natural Image Quality Evaluator)Making a ‘Completely Blind’ Image Quality Analyzerpaper
FID (Fréchet inception distance)GANs trained by a two time-scale update rule converge to a local nash equilibrium
LMD (landmark distance error)Lip Movements Generation at a Glance
LRA (lip-reading accuracy)Talking Face Generation by Conditional Recurrent Adversarial Networkpaper
WER(word error rate)Lipnet: end-to-end sentencelevel lipreading.
LSE-D (Lip Sync Error - Distance)Out of time: automated lip sync in the wild
LSE-C (Lip Sync Error - Confidence)Out of time: automated lip sync in the wild
ACD(Average content distance)Facenet: a unified embedding for face recognition and clustering.
CSIM(cosine similarity)Arcface: additive angular margin loss for deep face recognition.
EAR(eye aspect ratio)Real-time eye blink detection using facial landmarks. In: Computer Vision Winter Workshop
ESD(emotion similarity distance)What comprises a good talking-head video generation?: A Survey and Benchmark

Tools & Software

Tool/ResourceDescription
LUCIADevelopment of a MPEG-4 Talking Head Engine. 💻
Yepic StudioCreate and dub talking head-style videos in minutes without expensive equipment. 🎥
Mel McGee’s TalkbotsA complete multi-browser, multi-platform talking head application in SVG suitable for web sites or as an avatar. 🗣️
face3D_chungCreate 3D character avatar head objects with texture from a single photo for your games. 🎮
CrazyTalkExciting features for 3D head creation and automation. 🤪
tts avatar free download - SourceForgeMel McGee’s Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄)
Verbatim AI - Product Information, Latest Updates, and Reviews 2023A simple yet powerful API to generate AI “talking head” videos in near real-time with Verbatim AI. Add interest, intrigue, and dynamism to your chat bots! (🔧👄)
Best Open Source BASIC 3D Modeling SoftwareIncludes talk3D_chung, a small example using obj models created with face3D_chung, and speak3D_chung_dll, a dll to load and display face3D_chung talking avatars. (🛠️🎭)
DVDStyler / Discussion / Help: ffmpeg-vbr or internalTalking heads would get a bitrate which is unnecessarily high while using DVDStyler. (🛠️👄)
puffin web browser free download - SourceForgeMel McGee’s Talkbots is a complete multi-browser, multi-platform talking head. (🔧👄)
12 best AI video generators to use in 2023 Free and paid |Product …Whether you’re an entrepreneur, small business owner, or run a large company, AI video generators make it super easy to create high-quality videos from scratch. (🔧🎥)

Slides & Presentations

Presentation TitleDescription
Few-Shot Adversarial Learning of Realistic Neural Talking Head ModelsPresentation reviewing the few-shot adversarial learning of realistic neural talking head models.
Nethania Michelle’s CharacterPPT: Presentation discussing the improvement of a 3D talking head for use in an avatar of a virtual meeting room.
Presenting you: Top tips on presenting with Prezi Video – PreziArticle providing top tips for presenting with Prezi Video.
Research PresentationPPT: Resident Research Presentation Slide Deck.
Adding narration to your presentation (using Prezi Video) – PreziLearn how to add narration to your Prezi presentation with Prezi Video.

Star History

Star History Chart


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
3DGS 3DGS
3DGS 方向最新论文已更新,请持续关注 Update in 2024-01-24 Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting
2024-01-24
本篇 
超赞的数字人生成知识库  Awesome-Talking-Head-Synthesis 超赞的数字人生成知识库 Awesome-Talking-Head-Synthesis
超赞的数字人生成知识库 Awesome-Talking-Head-Synthesis, 这份资源库整理了与生成对抗网络(GAN)和神经辐射场(NeRF)相关的论文、代码和资源,重点关注基于图像和音频的虚拟讲话头合成论文及已发布代码。如果您觉得这个仓库有用,请star⭐支持!
  目录