4D-Humansで、3Dモデル推定とトラッキングを行う

1.はじめに

　今回ご紹介するのは、特殊なポーズでも３Dモデル推定が可能な4D-Humansという技術です。

2.4D-Humansとは？

　下記が4D-Humansの概略図です。まず、HMR 2.0と呼ばれるトランスフォーマー構造のネットワークで人間のポーズの特徴量を抽出し、3Dモデル推定します（左図）。これによって、従来では3Dモデル推定が困難であった特殊なポーズでも、モデル推定を可能にしています。

　次に、3Dモデル推定の結果を元に、あるシーンと次のシーンを比較し適切な補正を掛けることによって、人物の精度の良いTrackingも可能にしています（右図）。

3.コード

　コードはGoogle Colabで動かす形にしてGithubに上げてありますので、それに沿って説明して行きます。自分で動かしてみたい方は、この「リンク」をクリックし表示されたノートブックの先頭にある「Open in Colab」ボタンをクリックすると動かせます。

　まず、セットアップを行います。

#@title Set up

# Clone the main repo
! git clone https://github.com/cedro3/4D-Humans.git 4D-Humans
%cd 4D-Humans

# install library
!pip install torch
!pip install -e .[all]
!pip install git+https://github.com/brjathu/PHALP.git

# define function
import os
import shutil
def reset_folder(path):
    if os.path.isdir(path):
      shutil.rmtree(path)
    os.makedirs(path,exist_ok=True)

#@title Set up

# Clone the main repo

! git clone https://github.com/cedro3/4D-Humans.git 4D-Humans

%cd 4D-Humans

# install library

!pip install torch

!pip install -e .[all]

!pip install git+https://github.com/brjathu/PHALP.git

# define function

import os

import shutil

def reset_folder(path):

if os.path.isdir(path):

shutil.rmtree(path)

os.makedirs(path,exist_ok=True)

　まず、静止画から３Dモデルを推定してみましょう。picで静止画（jpg）を指定して実行します。自分の用意した静止画（jpg）を使用する場合は、example_data/images/ にアップロードして下さい。

#@title 4DHumans for picture
import shutil

# setting
reset_folder('demo_in')
reset_folder('demo_out')
pic = 'pexels-anete-lusina-4793258.jpg'#@param {type:"string"}
shutil.copy('example_data/images/'+pic, 'demo_in/'+pic)

# run 
!python demo.py \
--img_folder demo_in \
--out_folder demo_out \
--batch_size=48 --side_view

#@title 4DHumans for picture

import shutil

# setting

reset_folder('demo_in')

reset_folder('demo_out')

pic = 'pexels-anete-lusina-4793258.jpg'#@param {type:"string"}

shutil.copy('example_data/images/'+pic, 'demo_in/'+pic)

# run

!python demo.py \

--img_folder demo_in \

--out_folder demo_out \

--batch_size=48 --side_view

　それでは、３Dモデルを推定した静止画を見てみましょう。

#@title display picture

from IPython.display import Image, display
import os
# https://colab.research.google.com/drive/1Ex4gE5v1bPR3evfhtG7sDHxQGsWwNwby?usp=sharing
output_images = ["demo_out/" + i for i in os.listdir("demo_out/") if ".png" in i]
for img in output_images:
  display(Image(img))

#@title display picture

from IPython.display import Image, display

import os

# https://colab.research.google.com/drive/1Ex4gE5v1bPR3evfhtG7sDHxQGsWwNwby?usp=sharing

output_images = ["demo_out/" + i for i in os.listdir("demo_out/") if ".png" in i]

for img in output_images:

display(Image(img))

　サイドビューも同時に推定していますね。

　次に、動画から３Dモデル推定してみましょう。ちょっと面倒ですが、!python track.py video.source=の後に、推定したい動画のパスを記入して実行します。

#@title 4DHumans for video

# setting
reset_folder('outputs')

# run
!python track.py video.source='example_data/videos/backflip.mp4'

#@title 4DHumans for video

# setting

reset_folder('outputs')

# run

!python track.py video.source='example_data/videos/backflip.mp4'

　それでは３Dモデルを推定した動画を見てみましょう。

#@title play video

from IPython.display import Video, display
import os

# reencoding
output_videos = ["outputs/" + i for i in os.listdir("outputs/") if ".mp4" in i]
video_path=output_videos[0]
! ffmpeg -y -i $video_path -loglevel error output.mp4

# play
display(Video('output.mp4', embed=True))

#@title play video

from IPython.display import Video, display

import os

# reencoding

output_videos = ["outputs/" + i for i in os.listdir("outputs/") if ".mp4" in i]

video_path=output_videos[0]

! ffmpeg -y -i $video_path -loglevel error output.mp4

# play

display(Video('output.mp4', embed=True))

　従来の３Dモデル推定技術では、こういった体が逆さまになるようなポーズは上手く推定できなかったのですが、4D-Humansでは見事に３Dモデル推定出来ていますね。

（オリジナルgithub）https://github.com/shubham-goel/4D-Humans

（twitter投稿）

ブログを書きました！

今回ご紹介するのは、単眼画像から人物の３Dモデルを推定する「4D-Humans」という技術です。従来は難しかった「体が逆さまになるような特殊なポーズ」でも、３Dモデル推定が可能になっています。

ブログ：https://t.co/3upStcXTuU pic.twitter.com/5MXAdAJrEy
— cedro (@jun40vn) July 24, 2023