SberSwapで、個別の学習プロセス無しでFaceSwapを実現する

cedro

4年前

1.はじめに

　今までのFaceSwapは、個別に学習プロセスが必要なため処理に時間がかかるのが難点でした。今回ご紹介するのは、個別の学習プロセス無しでFaceSwapを実現するSberSwapという技術です。

2.SberSwapとは？

　下記の図は、SberSwapのモデル図でAEI-Netと呼ばれており、３つの部分で構成されています。

　１つ目がIdentity Encoderで、画像XsからベクトルZidを求めます。２つ目がMulti-level Attributes Encoderdで、U-Netと同様な構造を持ち画像Xtから特徴Zattを取り出します。そして、３つ目がAAD Generatorで、これらの情報から目的とする画像を生成します。

3.コード

　コードはGoogle Colabで動かす形にしてGithubに上げてありますので、それに沿って説明して行きます。自分で動かしてみたい方は、この「リンク」をクリックし表示されたノートブックの先頭にある「Colab on Web」ボタンをクリックすると動かせます。

　まず、セットアップを行います。

#@markdown #**セットアップ**

# Clone github 
!git clone https://github.com/cedro3/sber-swap.git
%cd sber-swap

# load arcface
!wget -P ./arcface_model https://github.com/sberbank-ai/sber-swap/releases/download/arcface/backbone.pth
!wget -P ./arcface_model https://github.com/sberbank-ai/sber-swap/releases/download/arcface/iresnet.py

# load landmarks detector
!wget -P ./insightface_func/models/antelope https://github.com/sberbank-ai/sber-swap/releases/download/antelope/glintr100.onnx
!wget -P ./insightface_func/models/antelope https://github.com/sberbank-ai/sber-swap/releases/download/antelope/scrfd_10g_bnkps.onnx

# load model itself
!wget -P ./weights https://github.com/sberbank-ai/sber-swap/releases/download/sber-swap-v2.0/G_unet_2blocks.pth

# load super res model
!wget -P ./weights https://github.com/sberbank-ai/sber-swap/releases/download/super-res/10_net_G.pth

# Install required libraries
!pip install mxnet-cu112
!pip install onnxruntime-gpu==1.12
!pip install insightface==0.2.1
!pip install kornia==0.5.4

!rm /usr/local/lib/python3.10/dist-packages/insightface/model_zoo/model_zoo.py #change the path to python in case you use a different version
!wget -P /usr/local/lib/python3.10/dist-packages/insightface/model_zoo/ https://github.com/AlexanderGroshev/insightface/releases/download/model_zoo/model_zoo.py #change the path to python in case you use a different version

# library import
import cv2
import torch
import time
import os
from utils.inference.image_processing import crop_face, get_final_image, show_images
from utils.inference.video_processing import read_video, get_target, get_final_video, add_audio_from_another_video, face_enhancement
from utils.inference.core import model_inference
from network.AEI_Net import AEI_Net
from coordinate_reg.image_infer import Handler
from insightface_func.face_detect_crop_multi import Face_detect_crop
from arcface_model.iresnet import iresnet100
from models.pix2pix_model import Pix2PixModel
from models.config_sr import TestOptions


# --- Initialize models ---
app = Face_detect_crop(name='antelope', root='./insightface_func/models')
app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640))

# main model for generation
G = AEI_Net(backbone='unet', num_blocks=2, c_id=512)
G.eval()
G.load_state_dict(torch.load('weights/G_unet_2blocks.pth', map_location=torch.device('cpu')))
G = G.cuda()
G = G.half()

# arcface model to get face embedding
netArc = iresnet100(fp16=False)
netArc.load_state_dict(torch.load('arcface_model/backbone.pth'))
netArc=netArc.cuda()
netArc.eval()

# model to get face landmarks
handler = Handler('./coordinate_reg/model/2d106det', 0, ctx_id=0, det_size=640)

# model to make superres of face, set use_sr=True if you want to use super resolution or use_sr=False if you don't
use_sr = True
if use_sr:
    os.environ['CUDA_VISIBLE_DEVICES'] = '0'
    torch.backends.cudnn.benchmark = True
    opt = TestOptions()
    #opt.which_epoch ='10_7'
    model = Pix2PixModel(opt)
    model.netG.train()

#@markdown #**セットアップ**

# Clone github

!git clone https://github.com/cedro3/sber-swap.git

%cd sber-swap

# load arcface

!wget -P ./arcface_model https://github.com/sberbank-ai/sber-swap/releases/download/arcface/backbone.pth

!wget -P ./arcface_model https://github.com/sberbank-ai/sber-swap/releases/download/arcface/iresnet.py

# load landmarks detector

!wget -P ./insightface_func/models/antelope https://github.com/sberbank-ai/sber-swap/releases/download/antelope/glintr100.onnx

!wget -P ./insightface_func/models/antelope https://github.com/sberbank-ai/sber-swap/releases/download/antelope/scrfd_10g_bnkps.onnx

# load model itself

!wget -P ./weights https://github.com/sberbank-ai/sber-swap/releases/download/sber-swap-v2.0/G_unet_2blocks.pth

# load super res model

!wget -P ./weights https://github.com/sberbank-ai/sber-swap/releases/download/super-res/10_net_G.pth

# Install required libraries

!pip install mxnet-cu112

!pip install onnxruntime-gpu==1.12

!pip install insightface==0.2.1

!pip install kornia==0.5.4

!rm /usr/local/lib/python3.10/dist-packages/insightface/model_zoo/model_zoo.py #change the path to python in case you use a different version

!wget -P /usr/local/lib/python3.10/dist-packages/insightface/model_zoo/ https://github.com/AlexanderGroshev/insightface/releases/download/model_zoo/model_zoo.py #change the path to python in case you use a different version

# library import

import cv2

import torch

import time

import os

from utils.inference.image_processing import crop_face, get_final_image, show_images

from utils.inference.video_processing import read_video, get_target, get_final_video, add_audio_from_another_video, face_enhancement

from utils.inference.core import model_inference

from network.AEI_Net import AEI_Net

from coordinate_reg.image_infer import Handler

from insightface_func.face_detect_crop_multi import Face_detect_crop

from arcface_model.iresnet import iresnet100

from models.pix2pix_model import Pix2PixModel

from models.config_sr import TestOptions

# --- Initialize models ---

app = Face_detect_crop(name='antelope', root='./insightface_func/models')

app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640))

# main model for generation

G = AEI_Net(backbone='unet', num_blocks=2, c_id=512)

G.eval()

G.load_state_dict(torch.load('weights/G_unet_2blocks.pth', map_location=torch.device('cpu')))

G = G.cuda()

G = G.half()

# arcface model to get face embedding

netArc = iresnet100(fp16=False)

netArc.load_state_dict(torch.load('arcface_model/backbone.pth'))

netArc=netArc.cuda()

netArc.eval()

# model to get face landmarks

handler = Handler('./coordinate_reg/model/2d106det', 0, ctx_id=0, det_size=640)

# model to make superres of face, set use_sr=True if you want to use super resolution or use_sr=False if you don't

use_sr = True

if use_sr:

os.environ['CUDA_VISIBLE_DEVICES'] = '0'

torch.backends.cudnn.benchmark = True

opt = TestOptions()

#opt.which_epoch ='10_7'

model = Pix2PixModel(opt)

model.netG.train()

　最初に、画像をFaceswapしてみましょう。sourceとtargetに画像を指定して実行すると、targetの顔がsourceの顔に置き換えられます。ここでは、sourceを松嶋菜々子、targetを石原さとみに設定しています。

　自分で用意した画像を使う場合は、examples/imagesにアップロードして下さい。

#@title #**画像をFaceswap**
#@markdown ・自分の画像は examples/images にアップロードして下さい
source = 'nanako.jpg' #@param {type:"string"}
target = 'satomi.jpg' #@param {type:"string"}
source_path = 'examples/images/'+source
target_path = 'examples/images/' + target

source_full = cv2.imread(source_path)
crop_size = 224 # don't change this
batch_size =  40

source = crop_face(source_full, app, crop_size)[0]
source = [source[:, :, ::-1]]

target_full = cv2.imread(target_path)
full_frames = [target_full]
target = get_target(full_frames, app, crop_size)

final_frames_list, crop_frames_list, full_frames, tfm_array_list = model_inference(full_frames,
                                                                                   source,
                                                                                   target,
                                                                                   netArc,
                                                                                   G,
                                                                                   app,
                                                                                   set_target = False,
                                                                                   crop_size=crop_size,
                                                                                   BS=batch_size)

result = get_final_image(final_frames_list, crop_frames_list, full_frames[0], tfm_array_list, handler)
cv2.imwrite('examples/results/result.png', result)

#@title #**画像をFaceswap**

#@markdown ・自分の画像は examples/images にアップロードして下さい

source = 'nanako.jpg' #@param {type:"string"}

target = 'satomi.jpg' #@param {type:"string"}

source_path = 'examples/images/'+source

target_path = 'examples/images/' + target

source_full = cv2.imread(source_path)

crop_size = 224 # don't change this

batch_size = 40

source = crop_face(source_full, app, crop_size)[0]

source = [source[:, :, ::-1]]

target_full = cv2.imread(target_path)

full_frames = [target_full]

target = get_target(full_frames, app, crop_size)

final_frames_list, crop_frames_list, full_frames, tfm_array_list = model_inference(full_frames,

source,

target,

netArc,

app,

set_target = False,

crop_size=crop_size,

BS=batch_size)

result = get_final_image(final_frames_list, crop_frames_list, full_frames[0], tfm_array_list, handler)

cv2.imwrite('examples/results/result.png', result)

#@markdown #**画像を表示**
#@markdown ・画像は examples/results/results.png に保存されています
import matplotlib.pyplot as plt

show_images([source[0][:, :, ::-1], target_full, result], ['Source Image', 'Target Image', 'Swapped Image'], figsize=(20, 15))

#@markdown #**画像を表示**

#@markdown ・画像は examples/results/results.png に保存されています

import matplotlib.pyplot as plt

show_images([source[0][:, :, ::-1], target_full, result], ['Source Image', 'Target Image', 'Swapped Image'], figsize=(20, 15))

　石原さとみの写真の顔だけが、松嶋菜々子に入れ替わったのが分かると思います。

　次に、動画をFaceswapしてみましょう。sourceに画像を、videoに動画を設定して実行すると、videoの顔がsourceの顔に置き換えられます。ここでは、sourceをアンジェリーナ・ジョリー、videoを新垣結衣に設定しています。

　自分の用意した画像や動画を使いたい場合は、画像はexamples/imagesに、動画はexamples/videoにアップロードして下さい。なお、オンメモリで処理するので、動画はHDで20秒くらいまでにして下さい（長いとクラッシュします）。

#@title #**動画をFaceswap**
#@markdown ・自分の画像は examples/images にアップロードして下さい\
#@markdown ・自分の動画は examples/videos にアップロードして下さい
source = 'angelina.jpg' #@param {type:"string"}
video = 'yui.mp4' #@param {type:"string"}
source_path = 'examples/images/'+source
path_to_video = 'examples/videos/'+video

source_full = cv2.imread(source_path)
OUT_VIDEO_NAME = "examples/results/result.mp4"
crop_size = 224 # don't change this
batch_size =  40

source = crop_face(source_full, app, crop_size)[0]
source = [source[:, :, ::-1]]

full_frames, fps = read_video(path_to_video)
target = get_target(full_frames, app, crop_size)

START_TIME = time.time()

final_frames_list, crop_frames_list, full_frames, tfm_array_list = model_inference(full_frames,
                                                                                   source,
                                                                                   target,
                                                                                   netArc,
                                                                                   G,
                                                                                   app,
                                                                                   set_target = False,
                                                                                   crop_size=crop_size,
                                                                                   BS=batch_size)

if use_sr:
    final_frames_list = face_enhancement(final_frames_list, model)

get_final_video(final_frames_list,
                crop_frames_list,
                full_frames,
                tfm_array_list,
                OUT_VIDEO_NAME,
                fps, 
                handler)
  
add_audio_from_another_video(path_to_video, OUT_VIDEO_NAME, "audio")

print(f'Full pipeline took {time.time() - START_TIME}')
print(f"Video saved with path {OUT_VIDEO_NAME}")

#@title #**動画をFaceswap**

#@markdown ・自分の画像は examples/images にアップロードして下さい\

#@markdown ・自分の動画は examples/videos にアップロードして下さい

source = 'angelina.jpg' #@param {type:"string"}

video = 'yui.mp4' #@param {type:"string"}

source_path = 'examples/images/'+source

path_to_video = 'examples/videos/'+video

source_full = cv2.imread(source_path)

OUT_VIDEO_NAME = "examples/results/result.mp4"

crop_size = 224 # don't change this

batch_size = 40

source = crop_face(source_full, app, crop_size)[0]

source = [source[:, :, ::-1]]

full_frames, fps = read_video(path_to_video)

target = get_target(full_frames, app, crop_size)

START_TIME = time.time()

final_frames_list, crop_frames_list, full_frames, tfm_array_list = model_inference(full_frames,

source,

target,

netArc,

app,

set_target = False,

crop_size=crop_size,

BS=batch_size)

if use_sr:

final_frames_list = face_enhancement(final_frames_list, model)

get_final_video(final_frames_list,

crop_frames_list,

full_frames,

tfm_array_list,

OUT_VIDEO_NAME,

fps,

handler)

add_audio_from_another_video(path_to_video, OUT_VIDEO_NAME, "audio")

print(f'Full pipeline took {time.time() - START_TIME}')

print(f"Video saved with path {OUT_VIDEO_NAME}")

#@markdown #**動画を表示**
#@markdown ・動画は examples/results/results.mp4 に保存されています
from IPython.display import HTML
from base64 import b64encode

video_file = open(OUT_VIDEO_NAME, "r+b").read()
video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"

HTML(f"""<video width={800} controls><source src="{video_url}"></video>""")

#@markdown #**動画を表示**

#@markdown ・動画は examples/results/results.mp4 に保存されています

from IPython.display import HTML

from base64 import b64encode

video_file = open(OUT_VIDEO_NAME, "r+b").read()

video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"

HTML(f"""<video width={800} controls><source src="{video_url}"></video>""")

https://cedro3.com/wp-content/uploads/2022/01/yui_angelina_result.mp4

　新垣結衣のビデオの顔だけがアンジェリーナ・ジョリーに入れ替わりました。

　では、また。

（オリジナルgithub）https://github.com/sberbank-ai/sber-swap

（twitter投稿）

有名な動画の顔を入れ替えるディープフェイク技術FaceSwapは２つのビデオを使った長時間の学習プロセスが必要です。
SberSwapはこの学習プロセス無しで、ビデオと１枚の画像だけで顔を入れ替える技術です。
これは色々な顔をAngelina Jolieの顔に入れ替えた例です。

ブログ：https://t.co/QBH96iD1tr pic.twitter.com/v9FKHCDg1l
— cedro (@jun40vn) February 8, 2022