An early landmark project was the Video Rewrite program, published in 1997, which modified existing video footage of a person speaking to depict that person mouthing the words contained in a different audio track.[29] It was the first system to fully automate this kind of facial reanimation, and it did so using machine learning techniques to make connections between the sounds produced by a video's subject and the shape of the subject's face.[29]
Contemporary academic projects have focused on creating more realistic videos and on improving techniques.[30][31] The "Synthesizing Obama" program, published in 2017, modifies video footage of former president Barack Obama to depict him mouthing the words contained in a separate audio track.[30] The project lists as a main research contribution its photorealistic technique for synthesizing mouth shapes from audio.[30] The Face2Face program, published in 2016, modifies video footage of a person's face to depict them mimicking the facial expressions of another person in real time.[31] The project lists as a main research contribution the first method for re-enacting facial expressions in real time using a camera that does not capture depth, making it possible for the technique to be performed using common consumer cameras.[31]
In August 2018, researchers at the University of California, Berkeley published a paper introducing a fake dancing app that can create the impression of masterful dancing ability using AI.[32][33] This project expands the application of deepfakes to the entire body; previous works focused on the head or parts of the face.[32]
Researchers have also shown that deepfakes are expanding into other domains such as tampering with medical imagery.[34] In this work, it was shown how an attacker can automatically inject or remove lung cancer in a patient's 3D CT scan. The result was so convincing that it fooled three radiologists and a state-of-the-art lung cancer detection AI. To demonstrate the threat, the authors successfully performed the attack on a hospital in a White hat penetration test.[35]
A survey of deepfakes, published in May 2020, provides a timeline of how the creation and detection deepfakes have advanced over the last few years.[36] The survey identifies that researchers have been focusing on resolving the following challenges of deepfake creation:
Generalization. High-quality deepfakes are often achieved by training on hours of footage of the target. This challenge is to minimize the amount of training data and the time to train the model required to produce quality images and to enable the execution of trained models on new identities (unseen during training).
Paired Training. Training a supervised model can produce high-quality results, but requires data pairing. This is the process of finding examples of inputs and their desired outputs for the model to learn from. Data pairing is laborious and impractical when training on multiple identities and facial behaviors. Some solutions include self-supervised training (using frames from the same video), the use of unpaired networks such as Cycle-GAN, or the manipulation of network embeddings.
Identity leakage. This is where the identity of the driver (i.e., the actor controlling the face in a reenactment) is partially transferred to the generated face. Some solutions proposed include attention mechanisms, few-shot learning, disentanglement, boundary conversions, and skip connections.
Occlusions. When part of the face is obstructed with a hand, hair, glasses, or any other item then artifacts can occur. A common occlusion is a closed mouth which hides the inside of the mouth and the teeth. Some solutions include image segmentation during training and in-painting.
Temporal coherence. In videos containing deepfakes, artifacts such as flickering and jitter can occur because the network has no context of the preceding frames. Some researchers provide this context or use novel temporal coherence losses to help improve realism. As the technology improves, the interference is diminishing.
Overall, deepfakes are expected to have several implications in media and society, media production, media representations, media audiences, gender, law, and regulation, and politics.[
#deep_fake
#مزيف_عميق
#artificialintelligence
Ещё видео!