Skip to content

Summary of some papers

Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

Earliest representative work for depth map estimation.

  • A network consisting of a global coarse estimate and a local finer estimate
  • A scale-invariant loss function

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

Leverage a Encoder-3DLSTM-Decoder architecture for 2D images to 3D voxel model mapping. The loss function is binary classification, for occupancy in the 3D space. There's also an IOU metric.

  • Input can be single view or multi-view.
  • Utilize voxel representation
  • Computational costly

A Point Set Generation Network for 3D Object Reconstruction from a Single Image

A model from a single image to 3D point cloud. Point cloud is an unordered representation. The challenge is that the same geometry can be represented with different point-clouds. The paper introduced two point cloud losses: Chamfer loss and Earth Mover's distance.

A 2D image, when constructed, can have many shapes. The paper introduced Min-of-N loss (MoN). n random perturbations to the network, and from n reconstructions, there is at least one prediction which is very close to gt.

  • Earliest point cloud reconstruction from single image.
  • Loss functions for point cloud generation.

Pixel2Mesh: Generating 3D Mesh Models from Single RGB Image

A model from an image to 3D mesh, which is initialized from ellipsoid mesh. The network consists of two parts:

  • Fully conv network for feature extraction from image
  • GCN for representing 3D mesh.
    • C: point coordinates; P: image features; F: Features for points
    • Perceptual feature pooling: extract image features according to point coordinates C(i-1)
    • Skip-connection with features from the last time step F(i-1) -> G-ResNet
    • The ouput of G-ResNet (graph-based ResNet) serves as ouput of mesh deformable block -> C(i), F(i)
  • Graph uppooling: a coarse-to-fine approach to ensure more stable deformation

The paper defines 4 losses to control mesh deformation

  • Chamfer loss: control point coordinates
  • Normal loss: smooth the surface
  • Laplacian regularization: control neighborhood points
  • Edge length regularization: prevent abnormal points

Mesh R-CNN

A system that detects objects and generates a 3D mesh representation, achieved with an additional branch upon mask-rcnn.

  • A voxel branch that outputs coarse voxel prediction based on RoIAligned proposal features. A fully convolutional network that predicts GGG voxel occupancy, represented with G*G feature maps (G channels).
  • A mesh refinement branch that converts voxels to mesh and uses GCN to refine the outputs. It first binarizes voxel occupancy probability. The cubify operation outputs mesh with 8 vertex, 18 edges and 12 faces.
    • Vert Align: project 3D coordinates to plane. Computes image features for the points.
    • GCN: propagate point features
    • Refine: Upates points.

The losses of this paper includes Mask-RCNN Box/Mask loss, voxel prediction loss (binary cross entropy for voxel occupancy), and loss for the refinement branch.

  • Chamfer distance for point clouds between the predictions and gts.
  • Normal distance
  • Shape regularizer

Experiments were on ShapeNet and Pixe3D datasets.