TFG|已知物体空间位置估计

TensorFlow又出了一个好玩的开源库:Graphics,主要解决从渲染器最终渲染结果逆推出各Mesh的Transform和Rotation。本文为官方提供的入门项目Object pose estimation的阅读笔记,代码均摘自源文件中。

Overview

上图真是简洁易懂,其中模型可高度自定义,如这个教程将使用全连接。这里官方推荐使用Colab进行学习,一个不需要在本地配置环境的在线编辑器,有着支持深度学习的性能。

Object pose alignment

对一个三维物体的分析是很多产业的基础技术,诸如虚拟、增强现实,它们需要响应用户对物体各项参数的修改(如通过旋钮修改物体大小)。下面这个例子将介绍如何使用TensorFlow/Graphics库对物体的旋转角、位移进行估计。

本文不可避免地会出现生硬翻译与心血来潮的句子,请见谅。

本文将介绍两种方式:

  • 机器学习:使用一个简单的神经网络进行深度学习。
  • 数学优化

输入数据为一个表示顶点(Vec3)的有序数组,输出为7位数组(原来还没到图像啊,某种意义上十分简单了,运算量也不是很大)。

环境搭建

通过pip简单地安装需要的包。

1
$ pip install tensorflow_graphics

待补充:如何安装GPU版本的TensorFlow/Graphics

接下来我们将所需要的包全部导入,同时加载一个简单的3D模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import time

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

from tensorflow_graphics.geometry.transformation import quaternion
from tensorflow_graphics.math import vector
from tensorflow_graphics.notebooks import threejs_visualization
from tensorflow_graphics.notebooks.resources import tfg_simplified_logo

tf.enable_eager_execution()

# Loads the Tensorflow Graphics simplified logo.
vertices = tfg_simplified_logo.mesh['vertices'].astype(np.float32)
faces = tfg_simplified_logo.mesh['faces']
num_vertices = vertices.shape[0]

1.深度学习方法

定义模型

通过3D模型我们可以知道所有的顶点相对位置信息,籍此我们需要一个可以预测旋转角(用四元数表示)与位移(三维向量)的网。以下代码创建了一个仅3层的全连接网络,同时定义了损失函数注意:这个网络未经优化,优化模型并不是本文涵盖的内容。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Constructs the model.
model = keras.Sequential()
model.add(layers.Flatten(input_shape=(num_vertices, 3)))
model.add(layers.Dense(64, activation=tf.nn.tanh))
model.add(layers.Dense(64, activation=tf.nn.relu))
model.add(layers.Dense(7))


def pose_estimation_loss(y_true, y_pred):
"""Pose estimation loss used for training.

This loss measures the average of squared distance between some vertices
of the mesh in 'rest pose' and the transformed mesh to which the predicted
inverse pose is applied. Comparing this loss with a regular L2 loss on the
quaternion and translation values is left as exercise to the interested
reader.

Args:
y_true: The ground-truth value.
y_pred: The prediction we want to evaluate the loss for.

Returns:
A scalar value containing the loss described in the description above.
"""
# y_true.shape : (batch, 7)
y_true_q, y_true_t = tf.split(y_true, (4, 3), axis=-1)
# y_pred.shape : (batch, 7)
y_pred_q, y_pred_t = tf.split(y_pred, (4, 3), axis=-1)

# vertices.shape: (num_vertices, 3)
# corners.shape:(num_vertices, 1, 3)
corners = tf.expand_dims(vertices, axis=1)

# transformed_corners.shape: (num_vertices, batch, 3)
# q and t shapes get pre-pre-padded with 1's following standard broadcast rules.
transformed_corners = quaternion.rotate(corners, y_pred_q) + y_pred_t

# recovered_corners.shape: (num_vertices, batch, 3)
recovered_corners = quaternion.rotate(transformed_corners - y_true_t,
quaternion.inverse(y_true_q))

# vertex_error.shape: (num_vertices, batch)
vertex_error = tf.reduce_sum((recovered_corners - corners)**2, axis=-1)

return tf.reduce_mean(vertex_error)


optimizer = keras.optimizers.Adam()
model.compile(loss=pose_estimation_loss, optimizer=optimizer)
model.summary()

损失函数:

  • 拆分输出项:将(,7)拆分为(,4)(,3)
  • 用预测值对3D模型形变
  • 用正确值还原对3D模型的形变
  • 求出所有顶点偏移距离的平方(二阶范数)和
  • 返回单个batch的error平均值

优化器:Adam

生成数据集

定义好了模型,我们便需要大量数据集去训练它。在每一个数据中,我们定义一个对3D模型的旋转、位移变化。值得注意的是,这些变化都可逆,以便损失函数的计算。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def generate_training_data(num_samples):
# random_angles.shape: (num_samples, 3)
random_angles = np.random.uniform(-np.pi, np.pi,
(num_samples, 3)).astype(np.float32)

# random_quaternion.shape: (num_samples, 4)
random_quaternion = quaternion.from_euler(random_angles)

# random_translation.shape: (num_samples, 3)
random_translation = np.random.uniform(-2.0, 2.0,
(num_samples, 3)).astype(np.float32)

# data.shape : (num_samples, num_vertices, 3)
data = quaternion.rotate(vertices[tf.newaxis, :, :],
random_quaternion[:, tf.newaxis, :]
) + random_translation[:, tf.newaxis, :]

# target.shape : (num_samples, 4+3)
target = tf.concat((random_quaternion, random_translation), axis=-1)

return np.array(data), np.array(target)

上面这个函数可以帮助我们创建有限个数的训练集。

注释:quaternion可以帮助我们从普通角转换为欧拉角,同时可以对一个3D对象进行旋转操作。

1
2
3
4
5
6
num_samples = 10000

data, target = generate_training_data(num_samples)

print(data.shape) # (num_samples, num_vertices, 3): the vertices
print(target.shape) # (num_samples, 4+3): the quaternion and translation

训练

终于,我们可以开始训练了!

以下代码定义回调类,输出训练中的细节信息,包括进度、损失等:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Callback allowing to display the progression of the training task.
class ProgressTracker(keras.callbacks.Callback):
def __init__(self, num_epochs, step=5):
self.num_epochs = num_epochs
self.current_epoch = 0.
self.step = step
self.last_percentage_report = 0

def on_epoch_end(self, batch, logs={}):
self.current_epoch += 1.
training_percentage = int(self.current_epoch * 100.0 / self.num_epochs)
if training_percentage - self.last_percentage_report >= self.step:
print('Training ' + str(
training_percentage) + '% complete. Training loss: ' + str(
logs.get('loss')) + ' | Validation loss: ' + str(
logs.get('val_loss')))
self.last_percentage_report = training_percentage

在训练达到拟合点附近时,我们需要减小学习速率(Learning Rate, LR),便需要以下回调函数:

1
2
3
4
5
6
7
8
9
reduce_lr_callback = keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=10,
verbose=0,
mode='auto',
min_delta=0.0001,
cooldown=0,
min_lr=0)

加入这两个回调函数并开始训练:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# google internal 1
# Everything is now in place to train.
EPOCHS = 100
pt = ProgressTracker(EPOCHS)
history = model.fit(
data,
target,
epochs=EPOCHS,
validation_split=0.2,
verbose=0,
batch_size=32,
callbacks=[reduce_lr_callback, pt])

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.ylim([0, 1])
plt.legend(['loss', 'val loss'], loc='upper left')
plt.xlabel('Train epoch')
_ = plt.ylabel('Error [mean square distance]')

这样一来我们便得到了一个训练完成的模型。

测试

现在网络已经训练好并可以使用了!最终显示的结果将由两幅图像组成:前者包含了物体初始模型(淡柠色)和随机旋转平移的对象(淡密色),以便清晰地观察两者不同之处;后者同样显示了初始状态下的物体,也有应用神经网络预测变换的模型。希望这两个物体重合率高一些(叹气)。

注意:多次使用不同的验证数据进行测试时,可能会出现物体大小不一致的情况,这是因为四元数表示的物体旋转同时也包括了缩放角。如果想要固定比例,需要将四元数归一化。为达成这个条件,可以在模型中或损失函数中添加约束。

我们从创建一个便于应用变化的函数开始:

1
2
3
4
# Defines the loss function to be optimized.
def transform_points(target_points, quaternion_variable, translation_variable):
return quaternion.rotate(target_points,
quaternion_variable) + translation_variable

借助threejs库来显示这几个3D模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
class Viewer(object):

def __init__(self, my_vertices):
my_vertices = np.asarray(my_vertices)
context = threejs_visualization.build_context()
light1 = context.THREE.PointLight.new_object(0x808080)
light1.position.set(10., 10., 10.)
light2 = context.THREE.AmbientLight.new_object(0x808080)
lights = (light1, light2)

material = context.THREE.MeshLambertMaterial.new_object({
'color': 0xfffacd,
})

material_deformed = context.THREE.MeshLambertMaterial.new_object({
'color': 0xf0fff0,
})

camera = threejs_visualization.build_perspective_camera(
field_of_view=30, position=(10.0, 10.0, 10.0))

mesh = {'vertices': vertices, 'faces': faces, 'material': material}
transformed_mesh = {
'vertices': my_vertices,
'faces': faces,
'material': material_deformed
}
geometries = threejs_visualization.triangular_mesh_renderer(
[mesh, transformed_mesh],
lights=lights,
camera=camera,
width=400,
height=400)

self.geometries = geometries

def update(self, transformed_points):
self.geometries[1].getAttribute('position').copyArray(
transformed_points.numpy().ravel().tolist())
self.geometries[1].getAttribute('position').needsUpdate = True

定义随机变化的函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def get_random_transform():
# Forms a random translation
with tf.name_scope('translation_variable'):
random_translation = tf.Variable(
np.random.uniform(-2.0, 2.0, (3,)), dtype=tf.float32)

# Forms a random quaternion
hi = np.pi
lo = -hi
random_angles = np.random.uniform(lo, hi, (3,)).astype(np.float32)
with tf.name_scope('rotation_variable'):
random_quaternion = tf.Variable(quaternion.from_euler(random_angles))

return random_quaternion, random_translation

预测并显示该3D模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
random_quaternion, random_translation = get_random_transform()

initial_orientation = transform_points(vertices, random_quaternion,
random_translation).numpy()
viewer = Viewer(initial_orientation)

predicted_transformation = model.predict(initial_orientation[tf.newaxis, :, :])

predicted_inverse_q = quaternion.inverse(predicted_transformation[0, 0:4])
predicted_inverse_t = -predicted_transformation[0, 4:]

predicted_aligned = quaternion.rotate(initial_orientation + predicted_inverse_t,
predicted_inverse_q)

viewer = Viewer(predicted_aligned)

2. Mathematical optimization

Here the problem is tackled using mathematical optimization, which is another traditional way to approach the problem of object pose estimation. Given correspondences between the object in ‘rest pose’ (pastel lemon color) and its rotated and translated counter part (pastel honeydew color), the problem can be formulated as a minimization problem. The loss function can for instance be defined as the sum of Euclidean distances between the corresponding points using the current estimate of the rotation and translation of the transformed object. One can then compute the derivative of the rotation and translation parameters with respect to this loss function, and follow the gradient direction until convergence. The following cell closely follows that procedure, and uses gradient descent to align the two objects. It is worth noting that although the results are good, there are more efficient ways to solve this specific problem. The interested reader is referred to the Kabsch algorithm for further details.

Note: press play multiple times to sample different test cases.

Define the loss and gradient functions:

1
2
3
4
5
6
7
8
9
10
11
def loss(target_points, quaternion_variable, translation_variable):
transformed_points = transform_points(target_points, quaternion_variable,
translation_variable)
error = (vertices - transformed_points) / num_vertices
return vector.dot(error, error)


def gradient_loss(target_points, quaternion, translation):
with tf.GradientTape() as tape:
loss_value = loss(target_points, quaternion, translation)
return tape.gradient(loss_value, [quaternion, translation])

Create the optimizer.

1
2
3
learning_rate = 0.05
with tf.name_scope('optimization'):
optimizer = tf.train.AdamOptimizer(learning_rate)

Initialize the random transformation, run the optimization and animate the result.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
random_quaternion, random_translation = get_random_transform()

transformed_points = transform_points(vertices, random_quaternion,
random_translation)

viewer = Viewer(transformed_points)

nb_iterations = 100
for it in range(nb_iterations):
gradients_loss = gradient_loss(vertices, random_quaternion,
random_translation)
optimizer.apply_gradients(
zip(gradients_loss, (random_quaternion, random_translation)))
transformed_points = transform_points(vertices, random_quaternion,
random_translation)

viewer.update(transformed_points)
time.sleep(0.1)

Summary

这是一个非常简单的深度学习模型,不过需要在预处理阶段处理很多额外的内容。加油吧!

Share