AI（人工知能） 2022.01.31 2023.06.05 cedro

SberSwapで、個別の学習プロセス無しでFaceSwapを実現する

1.はじめに

　今までのFaceSwapは、個別に学習プロセスが必要なため処理に時間がかかるのが難点でした。今回ご紹介するのは、個別の学習プロセス無しでFaceSwapを実現するSberSwapという技術です。

2.SberSwapとは？

　下記の図は、SberSwapのモデル図でAEI-Netと呼ばれており、３つの部分で構成されています。

　１つ目がIdentity Encoderで、画像XsからベクトルZidを求めます。２つ目がMulti-level Attributes Encoderdで、U-Netと同様な構造を持ち画像Xtから特徴Zattを取り出します。そして、３つ目がAAD Generatorで、これらの情報から目的とする画像を生成します。

3.コード

　コードはGoogle Colabで動かす形にしてGithubに上げてありますので、それに沿って説明して行きます。自分で動かしてみたい方は、この「リンク」をクリックし表示されたノートブックの先頭にある「Colab on Web」ボタンをクリックすると動かせます。

　まず、セットアップを行います。

#@markdown #**セットアップ**

# Clone github 
!git clone https://github.com/cedro3/sber-swap.git
%cd sber-swap

# load arcface
!wget -P ./arcface_model https://github.com/sberbank-ai/sber-swap/releases/download/arcface/backbone.pth
!wget -P ./arcface_model https://github.com/sberbank-ai/sber-swap/releases/download/arcface/iresnet.py

# load landmarks detector
!wget -P ./insightface_func/models/antelope https://github.com/sberbank-ai/sber-swap/releases/download/antelope/glintr100.onnx
!wget -P ./insightface_func/models/antelope https://github.com/sberbank-ai/sber-swap/releases/download/antelope/scrfd_10g_bnkps.onnx

# load model itself
!wget -P ./weights https://github.com/sberbank-ai/sber-swap/releases/download/sber-swap-v2.0/G_unet_2blocks.pth

# load super res model
!wget -P ./weights https://github.com/sberbank-ai/sber-swap/releases/download/super-res/10_net_G.pth

# Install required libraries
!pip install mxnet-cu112
!pip install onnxruntime-gpu==1.12
!pip install insightface==0.2.1
!pip install kornia==0.5.4

!rm /usr/local/lib/python3.10/dist-packages/insightface/model_zoo/model_zoo.py #change the path to python in case you use a different version
!wget -P /usr/local/lib/python3.10/dist-packages/insightface/model_zoo/ https://github.com/AlexanderGroshev/insightface/releases/download/model_zoo/model_zoo.py #change the path to python in case you use a different version

# library import
import cv2
import torch
import time
import os
from utils.inference.image_processing import crop_face, get_final_image, show_images
from utils.inference.video_processing import read_video, get_target, get_final_video, add_audio_from_another_video, face_enhancement
from utils.inference.core import model_inference
from network.AEI_Net import AEI_Net
from coordinate_reg.image_infer import Handler
from insightface_func.face_detect_crop_multi import Face_detect_crop
from arcface_model.iresnet import iresnet100
from models.pix2pix_model import Pix2PixModel
from models.config_sr import TestOptions


# --- Initialize models ---
app = Face_detect_crop(name='antelope', root='./insightface_func/models')
app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640))

# main model for generation
G = AEI_Net(backbone='unet', num_blocks=2, c_id=512)
G.eval()
G.load_state_dict(torch.load('weights/G_unet_2blocks.pth', map_location=torch.device('cpu')))
G = G.cuda()
G = G.half()

# arcface model to get face embedding
netArc = iresnet100(fp16=False)
netArc.load_state_dict(torch.load('arcface_model/backbone.pth'))
netArc=netArc.cuda()
netArc.eval()

# model to get face landmarks
handler = Handler('./coordinate_reg/model/2d106det', 0, ctx_id=0, det_size=640)

# model to make superres of face, set use_sr=True if you want to use super resolution or use_sr=False if you don't
use_sr = True
if use_sr:
    os.environ['CUDA_VISIBLE_DEVICES'] = '0'
    torch.backends.cudnn.benchmark = True
    opt = TestOptions()
    #opt.which_epoch ='10_7'
    model = Pix2PixModel(opt)
    model.netG.train()

#@markdown #**セットアップ**

# Clone github

!git clone https://github.com/cedro3/sber-swap.git

%cd sber-swap

# load arcface

!wget -P ./arcface_model https://github.com/sberbank-ai/sber-swap/releases/download/arcface/backbone.pth

!wget -P ./arcface_model https://github.com/sberbank-ai/sber-swap/releases/download/arcface/iresnet.py

# load landmarks detector

!wget -P ./insightface_func/models/antelope https://github.com/sberbank-ai/sber-swap/releases/download/antelope/glintr100.onnx

!wget -P ./insightface_func/models/antelope https://github.com/sberbank-ai/sber-swap/releases/download/antelope/scrfd_10g_bnkps.onnx

# load model itself

!wget -P ./weights https://github.com/sberbank-ai/sber-swap/releases/download/sber-swap-v2.0/G_unet_2blocks.pth

# load super res model

!wget -P ./weights https://github.com/sberbank-ai/sber-swap/releases/download/super-res/10_net_G.pth

# Install required libraries

!pip install mxnet-cu112

!pip install onnxruntime-gpu==1.12

!pip install insightface==0.2.1

!pip install kornia==0.5.4

!rm /usr/local/lib/python3.10/dist-packages/insightface/model_zoo/model_zoo.py #change the path to python in case you use a different version

!wget -P /usr/local/lib/python3.10/dist-packages/insightface/model_zoo/ https://github.com/AlexanderGroshev/insightface/releases/download/model_zoo/model_zoo.py #change the path to python in case you use a different version

# library import

import cv2

import torch

import time

import os

from utils.inference.image_processing import crop_face, get_final_image, show_images

from utils.inference.video_processing import read_video, get_target, get_final_video, add_audio_from_another_video, face_enhancement

from utils.inference.core import model_inference

from network.AEI_Net import AEI_Net

from coordinate_reg.image_infer import Handler

from insightface_func.face_detect_crop_multi import Face_detect_crop

from arcface_model.iresnet import iresnet100

from models.pix2pix_model import Pix2PixModel

from models.config_sr import TestOptions

# --- Initialize models ---

app = Face_detect_crop(name='antelope', root='./insightface_func/models')

app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640))

# main model for generation

G = AEI_Net(backbone='unet', num_blocks=2, c_id=512)

G.eval()

G.load_state_dict(torch.load('weights/G_unet_2blocks.pth', map_location=torch.device('cpu')))

G = G.cuda()

G = G.half()

# arcface model to get face embedding

netArc = iresnet100(fp16=False)

netArc.load_state_dict(torch.load('arcface_model/backbone.pth'))

netArc=netArc.cuda()

netArc.eval()

# model to get face landmarks

handler = Handler('./coordinate_reg/model/2d106det', 0, ctx_id=0, det_size=640)

# model to make superres of face, set use_sr=True if you want to use super resolution or use_sr=False if you don't

use_sr = True

if use_sr:

os.environ['CUDA_VISIBLE_DEVICES'] = '0'

torch.backends.cudnn.benchmark = True

opt = TestOptions()

#opt.which_epoch ='10_7'

model = Pix2PixModel(opt)

model.netG.train()

　最初に、画像をFaceswapしてみましょう。sourceとtargetに画像を指定して実行すると、targetの顔がsourceの顔に置き換えられます。ここでは、sourceを松嶋菜々子、targetを石原さとみに設定しています。

　自分で用意した画像を使う場合は、examples/imagesにアップロードして下さい。

#@title #**画像をFaceswap**
#@markdown ・自分の画像は examples/images にアップロードして下さい
source = 'nanako.jpg' #@param {type:"string"}
target = 'satomi.jpg' #@param {type:"string"}
source_path = 'examples/images/'+source
target_path = 'examples/images/' + target

source_full = cv2.imread(source_path)
crop_size = 224 # don't change this
batch_size =  40

source = crop_face(source_full, app, crop_size)[0]
source = [source[:, :, ::-1]]

target_full = cv2.imread(target_path)
full_frames = [target_full]
target = get_target(full_frames, app, crop_size)

final_frames_list, crop_frames_list, full_frames, tfm_array_list = model_inference(full_frames,
                                                                                   source,
                                                                                   target,
                                                                                   netArc,
                                                                                   G,
                                                                                   app,
                                                                                   set_target = False,
                                                                                   crop_size=crop_size,
                                                                                   BS=batch_size)

result = get_final_image(final_frames_list, crop_frames_list, full_frames[0], tfm_array_list, handler)
cv2.imwrite('examples/results/result.png', result)

#@title #**画像をFaceswap**

#@markdown ・自分の画像は examples/images にアップロードして下さい

source = 'nanako.jpg' #@param {type:"string"}

target = 'satomi.jpg' #@param {type:"string"}

source_path = 'examples/images/'+source

target_path = 'examples/images/' + target

source_full = cv2.imread(source_path)

crop_size = 224 # don't change this

batch_size = 40

source = crop_face(source_full, app, crop_size)[0]

source = [source[:, :, ::-1]]

target_full = cv2.imread(target_path)

full_frames = [target_full]

target = get_target(full_frames, app, crop_size)

final_frames_list, crop_frames_list, full_frames, tfm_array_list = model_inference(full_frames,

source,

target,

netArc,

app,

set_target = False,

crop_size=crop_size,

BS=batch_size)

result = get_final_image(final_frames_list, crop_frames_list, full_frames[0], tfm_array_list, handler)

cv2.imwrite('examples/results/result.png', result)

#@markdown #**画像を表示**
#@markdown ・画像は examples/results/results.png に保存されています
import matplotlib.pyplot as plt

show_images([source[0][:, :, ::-1], target_full, result], ['Source Image', 'Target Image', 'Swapped Image'], figsize=(20, 15))

#@markdown #**画像を表示**

#@markdown ・画像は examples/results/results.png に保存されています

import matplotlib.pyplot as plt

show_images([source[0][:, :, ::-1], target_full, result], ['Source Image', 'Target Image', 'Swapped Image'], figsize=(20, 15))

　石原さとみの写真の顔だけが、松嶋菜々子に入れ替わったのが分かると思います。

　次に、動画をFaceswapしてみましょう。sourceに画像を、videoに動画を設定して実行すると、videoの顔がsourceの顔に置き換えられます。ここでは、sourceをアンジェリーナ・ジョリー、videoを新垣結衣に設定しています。

　自分の用意した画像や動画を使いたい場合は、画像はexamples/imagesに、動画はexamples/videoにアップロードして下さい。なお、オンメモリで処理するので、動画はHDで20秒くらいまでにして下さい（長いとクラッシュします）。

#@title #**動画をFaceswap**
#@markdown ・自分の画像は examples/images にアップロードして下さい\
#@markdown ・自分の動画は examples/videos にアップロードして下さい
source = 'angelina.jpg' #@param {type:"string"}
video = 'yui.mp4' #@param {type:"string"}
source_path = 'examples/images/'+source
path_to_video = 'examples/videos/'+video

source_full = cv2.imread(source_path)
OUT_VIDEO_NAME = "examples/results/result.mp4"
crop_size = 224 # don't change this
batch_size =  40

source = crop_face(source_full, app, crop_size)[0]
source = [source[:, :, ::-1]]

full_frames, fps = read_video(path_to_video)
target = get_target(full_frames, app, crop_size)

START_TIME = time.time()

final_frames_list, crop_frames_list, full_frames, tfm_array_list = model_inference(full_frames,
                                                                                   source,
                                                                                   target,
                                                                                   netArc,
                                                                                   G,
                                                                                   app,
                                                                                   set_target = False,
                                                                                   crop_size=crop_size,
                                                                                   BS=batch_size)

if use_sr:
    final_frames_list = face_enhancement(final_frames_list, model)

get_final_video(final_frames_list,
                crop_frames_list,
                full_frames,
                tfm_array_list,
                OUT_VIDEO_NAME,
                fps, 
                handler)
  
add_audio_from_another_video(path_to_video, OUT_VIDEO_NAME, "audio")

print(f'Full pipeline took {time.time() - START_TIME}')
print(f"Video saved with path {OUT_VIDEO_NAME}")

#@title #**動画をFaceswap**

#@markdown ・自分の画像は examples/images にアップロードして下さい\

#@markdown ・自分の動画は examples/videos にアップロードして下さい

source = 'angelina.jpg' #@param {type:"string"}

video = 'yui.mp4' #@param {type:"string"}

source_path = 'examples/images/'+source

path_to_video = 'examples/videos/'+video

source_full = cv2.imread(source_path)

OUT_VIDEO_NAME = "examples/results/result.mp4"

crop_size = 224 # don't change this

batch_size = 40

source = crop_face(source_full, app, crop_size)[0]

source = [source[:, :, ::-1]]

full_frames, fps = read_video(path_to_video)

target = get_target(full_frames, app, crop_size)

START_TIME = time.time()

final_frames_list, crop_frames_list, full_frames, tfm_array_list = model_inference(full_frames,

source,

target,

netArc,

app,

set_target = False,

crop_size=crop_size,

BS=batch_size)

if use_sr:

final_frames_list = face_enhancement(final_frames_list, model)

get_final_video(final_frames_list,

crop_frames_list,

full_frames,

tfm_array_list,

OUT_VIDEO_NAME,

fps,

handler)

add_audio_from_another_video(path_to_video, OUT_VIDEO_NAME, "audio")

print(f'Full pipeline took {time.time() - START_TIME}')

print(f"Video saved with path {OUT_VIDEO_NAME}")

#@markdown #**動画を表示**
#@markdown ・動画は examples/results/results.mp4 に保存されています
from IPython.display import HTML
from base64 import b64encode

video_file = open(OUT_VIDEO_NAME, "r+b").read()
video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"

HTML(f"""<video width={800} controls><source src="{video_url}"></video>""")

#@markdown #**動画を表示**

#@markdown ・動画は examples/results/results.mp4 に保存されています

from IPython.display import HTML

from base64 import b64encode

video_file = open(OUT_VIDEO_NAME, "r+b").read()

video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"

HTML(f"""<video width={800} controls><source src="{video_url}"></video>""")

　新垣結衣のビデオの顔だけがアンジェリーナ・ジョリーに入れ替わりました。

　では、また。

（オリジナルgithub）https://github.com/sberbank-ai/sber-swap

（twitter投稿）

有名な動画の顔を入れ替えるディープフェイク技術FaceSwapは２つのビデオを使った長時間の学習プロセスが必要です。
SberSwapはこの学習プロセス無しで、ビデオと１枚の画像だけで顔を入れ替える技術です。
これは色々な顔をAngelina Jolieの顔に入れ替えた例です。

ブログ：https://t.co/QBH96iD1tr pic.twitter.com/v9FKHCDg1l
— cedro (@jun40vn) February 8, 2022

41 件のコメント

公式(SberBank AI)のcolab notebookの推論のセルは、バッチサイズが可変できますが、cedroさんのnotebookでは省略されているように見受けました。
省略した意図を教えていただければ幸いです。
（公式版でバッチサイズを40、100、1と可変して推論しても、生成される動画はすべて一律に同じものでしたので、現状でバッチサイズを指定しても無意味ということでしょうか)

1.はじめに

2.SberSwapとは？

3.コード

41 件のコメント

コメントを残す コメントをキャンセル

RECOMMENDこちらの記事も人気です。

Keras MLPを改造して定番パターンを勉強する

Retrieve in Styleで、顔画像の特徴だけを別のものに置き換える

SONY Neural Network Console でユーミンの歌詞…

Keras で変分オートエンコーダ（VAE）をセレブの顔画像でやってみる

SONY Neural Network Console で転移学習をやっ…

PyTorch 文章から画像をサクッと生成してみる

NNabla CycleGAN で少女時代のコスチュームを入れ替えてみる

Tune-A-Videoで、文から動画を作成する

ABOUTこの記事をかいた人

NEW POSTこのライターの最新記事

Animate Anyoneで、１枚の画像から動画を生成する

SVDで静止画から動画を生成する

DiffMorpherを使って、拡散モデルでモーフィングを行う

Domo AIで、実写動画をアニメ化する

最近の投稿

最近のコメント

アーカイブ

カテゴリー

メタ情報

コメントを残すコメントをキャンセル