wav2lip-HQで、高品質のリップシンクを行う

cedro

3年前

1.はじめに

　以前、人物動画の口を音声に合わせて動かす、wav2lipをご紹介しました。今回ご紹介するのは、その高解像度版 wav2lip-HQです。

2.wav2lip-HQとは？

　画像の超解像（ESRGAN）と顔セグメンテーション（face_segmentation）を使用して、リップシンクビデオの視覚的品質を向上させています。

3.コード

　コードはGoogle Colabで動かす形にしてGithubに上げてありますので、それに沿って説明して行きます。自分で動かしてみたい方は、この「リンク」をクリックし表示されたノートブックの先頭にある「Open in Colab」ボタンをクリックすると動かせます。

　　まず、セットアップを行います。

#@title **Set Up**
!git clone https://github.com/cedro3/wav2lip-hq.git
%cd wav2lip-hq
!pip3 install gdown
!pip3 install -r requirements.txt

!wget "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" -O "face_detection/detection/sfd/s3fd.pth"

from function import display_mp4
! mkdir download

#@title **Set Up**

!git clone https://github.com/cedro3/wav2lip-hq.git

%cd wav2lip-hq

!pip3 install gdown

!pip3 install -r requirements.txt

!wget "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" -O "face_detection/detection/sfd/s3fd.pth"

from function import display_mp4

! mkdir download

　次に、３つの学習済みファイル（pth）を google driveからダウンロードします。このとき、エラーが発生する場合があります。

#@title **download pth files**
import gdown

urls = {
    "wav2lip_gan.pth": "10Iu05Modfti3pDbxCFPnofmfVlbkvrCm", 
    "face_segmentation.pth": "154JgKpzCPW82qINcVieuPH3fZ2e0P812",
    "esrgan_max.pth": "1e5LT83YckB5wFKXWV4cWOPkVRnCDmvwQ"
}

for name, id in urls.items():
    url = f"https://drive.google.com/uc?id={id}"
    output = f"checkpoints/{name}"
    gdown.download(url, output, quiet=False)
    print(f"Loaded {name}")

#@title **download pth files**

import gdown

urls = {

"wav2lip_gan.pth": "10Iu05Modfti3pDbxCFPnofmfVlbkvrCm",

"face_segmentation.pth": "154JgKpzCPW82qINcVieuPH3fZ2e0P812",

"esrgan_max.pth": "1e5LT83YckB5wFKXWV4cWOPkVRnCDmvwQ"

}

for name, id in urls.items():

url = f"https://drive.google.com/uc?id={id}"

output = f"checkpoints/{name}"

gdown.download(url, output, quiet=False)

print(f"Loaded {name}")

　これは、wav2lip_gan.pth と esrgan_max.pth ２つの学習済みファイル（pth）のダウンロードで、Access denied with the following error: が発生した例です。こうなった場合は、それぞれのリンクからブラウザで学習済みファイル（pth）をダウンロードし、次のブロックを実行し Colabと google drive を接続して対処します。エラーが発生しなかった場合は、次のブロックはスキップして下さい。

#@title (If you get "Access denied with the following error")
#@markdown If you get "Access denied with the following error" in the above block, run this block to connect colab and google drive. And then do:
#@markdown 1. You download the pth file using your browser and upload the pth file to your google drive.
#@markdown 2. Drag and drop the pth file from google drive to the checkpoints folder using colab's left window (it takes a while to react).
from google.colab import drive
drive.mount('/content/drive')

#@title (If you get "Access denied with the following error")

#@markdown If you get "Access denied with the following error" in the above block, run this block to connect colab and google drive. And then do:

#@markdown 1. You download the pth file using your browser and upload the pth file to your google drive.

#@markdown 2. Drag and drop the pth file from google drive to the checkpoints folder using colab's left window (it takes a while to react).

from google.colab import drive

drive.mount('/content/drive')

　上記ブロックを実行し Colabと google drive を接続したら、学習済みファイル（pth）を google drive にアップロードします。そして、colabの左ウインドウで、学習済みファイル（pth）をgoogle drive から checkpoints フォルダへドラッグ＆ドロップで移動させます（反応に少し時間が掛かります）。

　それでは、リップシンクしてみましょう。動画（mp4）をfaceに、音声（mp4）をaudioに設定します。このとき、音声が動画より短いと動画を音声の長さで切断し、音声が動画より長いと動画を先頭から繰り返します。ここでは、サンプル動画を使い、face は taki.mp4、audio は china.mp4を設定します。

　なお、自分で用意した動画や音声を使用する場合は、videos フォルダにアップロードして下さい。

#@title **Lip-sync**
face = "taki.mp4" #@param {type:"string"}
audio = "china.mp4" #@param {type:"string"}
face_path = "videos/"+face
audio_path = "videos/"+audio

!python inference.py \
        --checkpoint_path "checkpoints/wav2lip_gan.pth" \
        --segmentation_path "checkpoints/face_segmentation.pth" \
        --sr_path "checkpoints/esrgan_max.pth" \
        --face $face_path \
        --audio $audio_path \
        --outfile "results/output.mp4"

#@title **Lip-sync**

face = "taki.mp4" #@param {type:"string"}

audio = "china.mp4" #@param {type:"string"}

face_path = "videos/"+face

audio_path = "videos/"+audio

!python inference.py \

--checkpoint_path "checkpoints/wav2lip_gan.pth" \

--segmentation_path "checkpoints/face_segmentation.pth" \

--sr_path "checkpoints/esrgan_max.pth" \

--face $face_path \

--audio $audio_path \

--outfile "results/output.mp4"

　作成した動画を再生します。

#@title **play movie**
display_mp4('results/output.mp4')

1 2	#@title play movie display_mp4('results/output.mp4')

https://cedro3.com/wp-content/uploads/2023/01/taki_china.mp4

　作成した動画をダウンロードします。

#@title **Download movie**
import os
import shutil
from google.colab import files

file_path = 'download/'+os.path.splitext(face)[0]+'_'+os.path.splitext(audio)[0]+'.mp4'
shutil.copy('results/output.mp4', file_path)
files.download(file_path)

#@title **Download movie**

import os

import shutil

from google.colab import files

file_path = 'download/'+os.path.splitext(face)[0]+'_'+os.path.splitext(audio)[0]+'.mp4'

shutil.copy('results/output.mp4', file_path)

files.download(file_path)

　では、また。

（オリジナルgithub）https://github.com/Markfryazino/wav2lip-hq

（twitter投稿）

ブログを書きました！

今回ご紹介するのは、人物動画の口を音声に合わせて動かすwav2lip-HQという技術です。

これは、同じ動画を３種類の音声（中国語、ロシア語、日本語）に合わせて動かした例です。

ブログ：https://t.co/L7uFLz0xCI pic.twitter.com/GKhdLMjoqT
— cedro (@jun40vn) January 18, 2023