Summary of some papers

Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

Earliest representative work for depth map estimation.

A network consisting of a global coarse estimate and a local finer estimate
A scale-invariant loss function

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

Leverage a Encoder-3DLSTM-Decoder architecture for 2D images to 3D voxel model mapping. The loss function is binary classification, for occupancy in the 3D space. There's also an IOU metric.

Input can be single view or multi-view.
Utilize voxel representation
Computational costly

A Point Set Generation Network for 3D Object Reconstruction from a Single Image

A model from a single image to 3D point cloud. Point cloud is an unordered representation. The challenge is that the same geometry can be represented with different point-clouds. The paper introduced two point cloud losses: Chamfer loss and Earth Mover's distance.

A 2D image, when constructed, can have many shapes. The paper introduced Min-of-N loss (MoN). n random perturbations to the network, and from n reconstructions, there is at least one prediction which is very close to gt.

Earliest point cloud reconstruction from single image.
Loss functions for point cloud generation.

Pixel2Mesh: Generating 3D Mesh Models from Single RGB Image

A model from an image to 3D mesh, which is initialized from ellipsoid mesh. The network consists of two parts:

Fully conv network for feature extraction from image
GCN for representing 3D mesh.
- C: point coordinates; P: image features; F: Features for points
- Perceptual feature pooling: extract image features according to point coordinates C(i-1)
- Skip-connection with features from the last time step F(i-1) -> G-ResNet
- The ouput of G-ResNet (graph-based ResNet) serves as ouput of mesh deformable block -> C(i), F(i)
Graph uppooling: a coarse-to-fine approach to ensure more stable deformation

The paper defines 4 losses to control mesh deformation

Chamfer loss: control point coordinates
Normal loss: smooth the surface
Laplacian regularization: control neighborhood points
Edge length regularization: prevent abnormal points

Mesh R-CNN

A system that detects objects and generates a 3D mesh representation, achieved with an additional branch upon mask-rcnn.

A voxel branch that outputs coarse voxel prediction based on RoIAligned proposal features. A fully convolutional network that predicts GGG voxel occupancy, represented with G*G feature maps (G channels).
A mesh refinement branch that converts voxels to mesh and uses GCN to refine the outputs. It first binarizes voxel occupancy probability. The cubify operation outputs mesh with 8 vertex, 18 edges and 12 faces.
- Vert Align: project 3D coordinates to plane. Computes image features for the points.
- GCN: propagate point features
- Refine: Upates points.

The losses of this paper includes Mask-RCNN Box/Mask loss, voxel prediction loss (binary cross entropy for voxel occupancy), and loss for the refinement branch.

Chamfer distance for point clouds between the predictions and gts.
Normal distance
Shape regularizer

Experiments were on ShapeNet and Pixe3D datasets.