SPoVT: Semantic-Prototype Variational Transformer for Dense Point Cloud Semantic Completion

NeurIPS 2022

Paper Code Video
SPoVT is a Transformer-based network for semantic point cloud completion. By taking partial point cloud and their semantic part labels as the inputs, our SPoVT is able to derive point cloud features and the associated semantic prototypes for each part. The semantic VAE scheme of SPoVT allows resampling of point features from each semantic prototype for completion, manipulation and detailed mesh reconstruction.

Abstract

Point cloud completion is an active research topic for 3D vision and has been widely studied in recent years. Instead of directly predicting the missing point cloud from the partial input, we introduce a Semantic-Prototype Variational Transformer (SPoVT) in this work, which takes both partial point cloud and their semantic labels as the inputs for semantic point cloud object completion. By observing and attending to geometry and semantic information as input features, our SPoVT would derive point cloud features and their semantic prototypes for completion purposes. As a result, our SPoVT not only performs point cloud completion with varying resolution, it also allows manipulation of different semantic parts of an object. Experiments on benchmark datasets would quantitatively and qualitatively verify the effectiveness and practicality of our proposed model.


Point Cloud Semantic Completion

Input
Ours
GT
Input
Ours
GT

Given partial point cloud and their semantic labels, SPoVT can perform point cloud semantic completion. Each color in the point cloud indicates a distinct semantic part.


Meshes Reconstruction

VRCNet (16k)
PoinTr (16k)
Ours (16k)
Ours (300k)
GT (16k)
VRCNet (16k)
PoinTr (16k)
Ours (16k)
Ours (300k)
GT (16k)
VRCNet (16k)
PoinTr (16k)
Ours (16k)
Ours (300k)
GT (16k)
VRCNet (16k)
PoinTr (16k)
Ours (16k)
Ours (300k)
GT (16k)

The variational inference property of SPoVT allows for repeated sampling from captured semantic prototypes during the completion process. This approach further improves the fidelity of reconstructed meshes derived from a complete point cloud.


Semantic Prototypes Interpolation

Instance-level Interpolation




SPoVT is capable of achieving latent interpolation of the entire set of semantic prototypes between two point clouds.


Part-level Interpolation




SPoVT is capable of achieving latent interpolation of one semantic prototype between two point clouds.



Overview of SPoVT




(left) The encoder learns the point features for input point cloud and the prototypes of each semantic part. The semantic VAE scheme is presented for capturing point cloud distribution of each semantic part given above features. An additional ratio predictor is introduced to predict point number distribution across semantic parts. (right) The decoder samples point features for each semantic part with predicted point number distribution, then outputs the coarse point cloud. The refinement network is used to refine the coarse point cloud to the final point cloud.



Citation

@inproceedings{huang2022spovt,
  title = {SPoVT: Semantic-Prototype Variational Transformer for Dense Point Cloud Semantic Completion},
  author = {Huang, Sheng Yu and Hsu, Hao-Yu and Wang, Frank},
  booktitle = {Advances in Neural Information Processing Systems},
  pages = {33934--33946},
  year = {2022},
}

Acknowledgement

This work is supported in part by the Tron Future Tech Inc. and National Science and Technology Council via NSTC-110-2634-F-002-052. We also thank National Center for High-performance Computing (NCHC) for providing computational and storage resources.

This webpage is borrowed from DreamFusion and SDFusion. Thanks for their beautiful websites!