Summary of some papers
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
Earliest representative work for depth map estimation.
- A network consisting of a global coarse estimate and a local finer estimate
- A scale-invariant loss function
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Leverage a Encoder-3DLSTM-Decoder architecture for 2D images to 3D voxel model mapping. The loss function is binary classification, for occupancy in the 3D space. There's also an IOU metric.
- Input can be single view or multi-view.
- Utilize voxel representation
- Computational costly
A Point Set Generation Network for 3D Object Reconstruction from a Single Image
A model from a single image to 3D point cloud. Point cloud is an unordered representation. The challenge is that the same geometry can be represented with different point-clouds. The paper introduced two point cloud losses: Chamfer loss and Earth Mover's distance.
A 2D image, when constructed, can have many shapes. The paper introduced Min-of-N loss (MoN). n random perturbations to the network, and from n reconstructions, there is at least one prediction which is very close to gt.
- Earliest point cloud reconstruction from single image.
- Loss functions for point cloud generation.
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Image
A model from an image to 3D mesh, which is initialized from ellipsoid mesh. The network consists of two parts:
- Fully conv network for feature extraction from image
- GCN for representing 3D mesh.
- C: point coordinates; P: image features; F: Features for points
- Perceptual feature pooling: extract image features according to point coordinates C(i-1)
- Skip-connection with features from the last time step F(i-1) -> G-ResNet
- The ouput of G-ResNet (graph-based ResNet) serves as ouput of mesh deformable block -> C(i), F(i)
- Graph uppooling: a coarse-to-fine approach to ensure more stable deformation
The paper defines 4 losses to control mesh deformation
- Chamfer loss: control point coordinates
- Normal loss: smooth the surface
- Laplacian regularization: control neighborhood points
- Edge length regularization: prevent abnormal points
Mesh R-CNN
A system that detects objects and generates a 3D mesh representation, achieved with an additional branch upon mask-rcnn.
- A voxel branch that outputs coarse voxel prediction based on RoIAligned proposal features. A fully convolutional network that predicts GGG voxel occupancy, represented with G*G feature maps (G channels).
- A mesh refinement branch that converts voxels to mesh and uses GCN to refine the outputs. It first binarizes voxel occupancy probability. The cubify operation outputs mesh with 8 vertex, 18 edges and 12 faces.
- Vert Align: project 3D coordinates to plane. Computes image features for the points.
- GCN: propagate point features
- Refine: Upates points.
The losses of this paper includes Mask-RCNN Box/Mask loss, voxel prediction loss (binary cross entropy for voxel occupancy), and loss for the refinement branch.
- Chamfer distance for point clouds between the predictions and gts.
- Normal distance
- Shape regularizer
Experiments were on ShapeNet and Pixe3D datasets.