毎週の歯科治療が一段落し, とうとう外に出る理由が一切なくなりました｡

ホクソエムサポーターのKAZYです｡

6畳の部屋に籠もり続けて健康を維持できるのか不安なこの頃｡運動不足も気になります｡

ホクソエムのおじさんたちもきっと同じ悩みを抱えてることでしょう｡

ところで最近は静止画を簡単に踊らせてやることができるらしいです｡

referenceの動画の動きに合わせて、sourceの静止画をぐりぐり動かせるAI。
Attention機構などを使い、referenceから抽出した動きの情報をsource画像に当てはめ、Discriminatorに真偽判定させるGANを主な機構として用いているとのこと。https://t.co/YdsiRi0Enp pic.twitter.com/7xQ8oohqyo
— 福田敦史 / Aillis CTO (@fukumimi014) December 13, 2020

私は閃きました｡

この技術を使ってホクソエムのおじさん達をグリグリ動かす｡

そうすればおじさんの運動不足は解消される｡

それにより, おじさんたちは気分が良くなる｡

私は感謝されご褒美をたんまりもらえる｡💰💰💰💰

素晴らしいシナリオです｡

天才かもしれない｡

今回のアウトプット(忙しい人用)

doukana arekana koukana f:id:KAZYPinkSaurus:20210226134504p:plain

フリー素材です｡

ご自由にお持ち帰りください｡

今回使う技術の流れの雰囲気なお気持ち

ホクソエムブログって実はテックブログ的なものらしい(NOT ポエム置き場)｡

なので今回使う技術のお気持ち程度の解説を記しておくことにする｡

この技術が発表された論文の名はLiquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis｡

2020年のもの｡

論文に載っていたFramework OverViewに私がちょびっとコメントを追加画像がこちら｡

f:id:KAZYPinkSaurus:20210214193929p:plain — 雑of雑な解説

やってることはだいたいこんな感じだ｡

やりたいことはSource Image $Is_{i}$ をReference $I_r$ のに姿勢にしたい｡

なるべく自然な感じに｡

それを実現するために考えられた流れは

1. 画像からHuman Mesh Recovery(HMR)というタスクを行い, 2次元画像から3次元のメッシュ情報を推定してする｡
2. 3次元メッシュを考慮した画像間の人物部位の対応マップ的な $T$ をつくる (Transeformation flowと呼んでいる)｡
3. $T$ を使って $Is_i$ を $I_r$ の姿勢にした $I^{syn}_{t}$ を生成する｡
4. Convolutional Autoencoderライクな $G_{BG}, G_{SID}, G_{TSF}$ を使って $Is_i$ の背景画像, $Is_i$ の人部分のマスク画像と人の画像, $I_r$ の人部分のマスク画像と人の画像を作ってやる(このとき $LWB/AttLWB$ という機構を使って $G_{SID}$ から $G_{TSF}$ に情報を送り込んでやっているようだ)｡
5. $G_{BG},G_{SID}$ から生成された背景画像と人の画像を合成して画像 $\hat{I}_{si}$ を作る｡
6. $G_{BG},G_{TSF}$ から生成された背景画像と人の画像を合成して画像 $syn \hat{I}_{t}$ を作る (これがお目当てのやつやな)｡

上の流れでいい感じの結果を得るために

- $\hat{I}_{si}$ をなるだけ $I_{si}$ っぽくしたい気持ち
- $syn \hat{I}_{t}$ を偽物, $I_{r}$ を本物として, それらが見分けがつかないようにした気持ち(ここがGANな要素や)

をあわせて学習する感じ (雑)｡

あと一旦3Dメッシュにして3次元的な情報も考えているんだってところがポイントらしいです｡あとあと $LWB/AttLWB$ のところで人物を再構築するために使う特徴量を流し込んでやるところもポイントらしいです｡

詳細は論文を読んでほしいです｡

こちら↓のサイトに論文, コード, データセットなどなどが置いてあります｡ www.impersonator.org

2021年02月現在はComing soonとなっていますが, いいのアプリケーションも開発するプロジェクトもあるようです｡ f:id:KAZYPinkSaurus:20210131194257p:plain

おじさんの画像を集める

本題に戻ります｡

おじさんたちの画像を集めます｡

画像を募ったらなおじさん3名から画像を拝借できました(たぶんフリー素材)｡

いい表情｡

おじさんを動かす(その1)

この技術，嬉しいことにテストデータを動かしているノートブックが公開されているんです｡

脳みそが🐵な私は画像だけ差し替えればなんか動くだろの精神でおじさん達を投入してみます｡

colab.research.google.com

f:id:KAZYPinkSaurus:20210214211616p:plain — 保存されないぜって言っているし何してもokだろう....

ノートブックをしたに上から下に実行していきます｡

f:id:KAZYPinkSaurus:20210214213035p:plain — 上からポチポチと実行していく

トランプを踊らせている動画の設定のブロックにたどり着きました｡

# This is a specific model name, and it will be used if you do not change it. This is the case of `trump`
model_id = "donald_trump_2"

# the source input information, here \" is escape character of double duote "
src_path = "\"path?=/content/iPERCore/assets/samples/sources/donald_trump_2/00000.PNG,name?=donald_trump_2\""


## the reference input information. There are three reference videos in this case.
# here \" is escape character of double duote "
# ref_path = "\"path?=/content/iPERCore/assets/samples/references/akun_1.mp4," \
#              "name?=akun_2," \
#              "pose_fc?=300\""

ref_path = "\"path?=/content/iPERCore/assets/samples/references/mabaoguo_short.mp4," \
             "name?=mabaoguo_short," \
             "pose_fc?=400\""

# ref_path = "\"path?=/content/iPERCore/assets/samples/references/akun_1.mp4,"  \
#              "name?=akun_2," \
#              "pose_fc?=300|" \
#              "path?=/content/iPERCore/assets/samples/references/mabaoguo_short.mp4," \
#              "name?=mabaoguo_short," \
#              "pose_fc?=400\""

print(ref_path)

!python -m iPERCore.services.run_imitator  \
  --gpu_ids     $gpu_ids       \
  --num_source  $num_source    \
  --image_size  $image_size    \
  --output_dir  $output_dir    \
  --model_id    $model_id      \
  --cfg_path    $cfg_path      \
  --src_path    $src_path      \
  --ref_path    $ref_path

あー, 完全に理解した｡

ref_pathで動画, src_pathで画像を指定しているっぽいので, src_pathをおじさんに差し替えちゃえばいいんでしょう？

f:id:KAZYPinkSaurus:20210214214847p:plain — おじさん01をアップロード

src_pathとmodel_idをおじさんに差し替えてみました｡

# This is a specific model name, and it will be used if you do not change it. This is the case of `trump`
model_id = "ossan01"

# the source input information, here \" is escape character of double duote "
src_path = "\"path?=/content/iPERCore/ossan-01.jpg,name?=ossan01\""


## the reference input information. There are three reference videos in this case.
# here \" is escape character of double duote "
# ref_path = "\"path?=/content/iPERCore/assets/samples/references/akun_1.mp4," \
#              "name?=akun_2," \
#              "pose_fc?=300\""

ref_path = "\"path?=/content/iPERCore/assets/samples/references/mabaoguo_short.mp4," \
             "name?=mabaoguo_short," \
             "pose_fc?=400\""

# ref_path = "\"path?=/content/iPERCore/assets/samples/references/akun_1.mp4,"  \
#              "name?=akun_2," \
#              "pose_fc?=300|" \
#              "path?=/content/iPERCore/assets/samples/references/mabaoguo_short.mp4," \
#              "name?=mabaoguo_short," \
#              "pose_fc?=400\""

print(ref_path)

!python -m iPERCore.services.run_imitator  \
  --gpu_ids     $gpu_ids       \
  --num_source  $num_source    \
  --image_size  $image_size    \
  --output_dir  $output_dir    \
  --model_id    $model_id      \
  --cfg_path    $cfg_path      \
  --src_path    $src_path      \
  --ref_path    $ref_path

あと, 一つ前のセルでnum_source = 2となっていますが, 今回はおじさん画像は1枚なのでnum_source = 1に修正しました｡

そしてRun the trump caseのブロックを実行してみます｡

待つこと数分｡｡｡

----------------------MetaOutput----------------------
ossan01 imitates mabaoguo_short in ./results/primitives/ossan01/synthesis/imitations/ossan01-mabaoguo_short.mp4
------------------------------------------------------
Step 3: running imitator done.

という表示がでました｡どうやら終わったようです｡

youtu.be