MASK R-CNNで、画像から物体を検出し、形を推定する

cedro

3年前

1.はじめに

　画像の何処に何が写っているかを推定することを物体検出と言います。今回は、これに加えてそれがどんな形をしているかも推定（セグメンテーション）する、Mask R-CNNをご紹介します。

2.Mask R-CNNとは？

　Mask R-CNNは、Faster R-CNNの拡張版なので、まずFaster R-CNNから説明します。

　Faster R-CNNは、CNNを用いて物体の候補領域を抽出し、領域位置とクラスの確率を同時に推定するモデルです。つまり、物体にバウンディングボックスを掛け、どのクラスに属するかを出力します。

　Mask R-CNNではこれに加えて、バウンディングボックス内のピクセル単位でクラス分類を行うモデルです。つまり、物体の形も推定するわけです。

3.コード

　コードはGoogle Colabで動かす形にしてGithubに上げてありますので、それに沿って説明して行きます。自分で動かしてみたい方は、この「リンク」をクリックし表示されたノートブックの先頭にある「Colab on Web」ボタンをクリックすると動かせます。

　まず、Githubからのコードのコピーや必要なライブラリーのインストールなどセットアップを行います。詳細は、Google Colab のノートブックを参照下さい。

　次に、下記のコードで、ライブラリーやモデルのインポートと初期設定を行います。

import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt

# Root directory of the project
ROOT_DIR = os.path.abspath("/content/Mask_RCNN")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize

# Import COCO config
sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))  # To find local version
import coco

%matplotlib inline 

# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")

# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "images")

import os

import sys

import random

import math

import numpy as np

import skimage.io

import matplotlib

import matplotlib.pyplot as plt

# Root directory of the project

ROOT_DIR = os.path.abspath("/content/Mask_RCNN")

# Import Mask RCNN

sys.path.append(ROOT_DIR) # To find local version of the library

from mrcnn import utils

import mrcnn.model as modellib

from mrcnn import visualize

# Import COCO config

sys.path.append(os.path.join(ROOT_DIR, "samples/coco/")) # To find local version

import coco

%matplotlib inline

# Directory to save logs and trained model

MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Local path to trained weights file

COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")

# Download COCO trained weights from Releases if needed

if not os.path.exists(COCO_MODEL_PATH):

utils.download_trained_weights(COCO_MODEL_PATH)

# Directory of images to run detection on

IMAGE_DIR = os.path.join(ROOT_DIR, "images")

　36行目の IMAGE_DIR = os.path.join(ROOT_DIR, “images”) で物体検知を行う対象ファイルのフォルダーを指定しています。この場合は、Mask_RCNN/images を指定していることになります。

　そして、下記のコードで、学習済みの重みを読み込んで、物体検出を行います。クラス分類は「the teddy bear class」と呼ばれるもので、81種類の物体についてクラス分類を行います。

class InferenceConfig(coco.CocoConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

# config.display()
config = InferenceConfig()

# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)

# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
               'bus', 'train', 'truck', 'boat', 'traffic light',
               'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
               'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
               'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
               'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
               'kite', 'baseball bat', 'baseball glove', 'skateboard',
               'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
               'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
               'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
               'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
               'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
               'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
               'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
               'teddy bear', 'hair drier', 'toothbrush']

# Load image from the images folder
file_names = os.listdir(IMAGE_DIR)
file_names.sort()

for file_name in file_names:
  if file_name == '.ipynb_checkpoints':
     continue
  image = skimage.io.imread(os.path.join(IMAGE_DIR, file_name))

  # Run detection
  results = model.detect([image], verbose=1)

  # Visualize results
  r = results[0]
  visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], 
                              class_names, r['scores'])

class InferenceConfig(coco.CocoConfig):

# Set batch size to 1 since we'll be running inference on

# one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU

GPU_COUNT = 1

IMAGES_PER_GPU = 1

# config.display()

config = InferenceConfig()

# Create model object in inference mode.

model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO

model.load_weights(COCO_MODEL_PATH, by_name=True)

# COCO Class names

# Index of the class in the list is its ID. For example, to get ID of

# the teddy bear class, use: class_names.index('teddy bear')

class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',

'bus', 'train', 'truck', 'boat', 'traffic light',

'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',

'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',

'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',

'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',

'kite', 'baseball bat', 'baseball glove', 'skateboard',

'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',

'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',

'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',

'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',

'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',

'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',

'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',

'teddy bear', 'hair drier', 'toothbrush']

# Load image from the images folder

file_names = os.listdir(IMAGE_DIR)

file_names.sort()

for file_name in file_names:

if file_name == '.ipynb_checkpoints':

continue

image = skimage.io.imread(os.path.join(IMAGE_DIR, file_name))

# Run detection

results = model.detect([image], verbose=1)

# Visualize results

r = results[0]

visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],

class_names, r['scores'])

　これはパフィームのビデオです。パフィームの3人はしっかり検出（全て確率100%）されています。コスチュームの形状の関係か、「かしゆか」のスカートがハンドバックと誤検出されていますが、確率は78.1%と低いです。

　これも、パフィームの3人はしっかり検出（全て確率100%）され、壁に埋め込んだTVも検出されています。なぜか「のっち」の左腕が、野球のバットと誤検出されていますが、確率は74.5%と低いです。

　スケートボードやパラソルなど、クラスに登録してある様々なものが検出されています。

　では、また。

（オリジナル）https://github.com/matterport/Mask_RCNN

（Twitter投稿）

動画の途中のフレームにMask R-CNNを掛けたものを投稿しました。

ブログを書きました！

「ディープラーニングで、画像から物体を検出し、形を推定する」

　画像の何処に何が写っているかを推定することを物体検出と言います。今回は、これに加えてそれがどんな形をしているかも推定（セグメンテーション）するMask R-CNNをご紹介します。https://t.co/aEiOoYfam5 pic.twitter.com/PnNUjNNY31
— cedro (@jun40vn) November 29, 2020

新たな動画バージョンをTwitterへ投稿しました。

ディープラーニングで、スケートボードの動画から物体検出し、形を推定する

ブログ：https://t.co/aEiOoYfam5 pic.twitter.com/aURwqph4QO
— cedro (@jun40vn) January 14, 2021