"Qi, Ruizhongtai," | Search | Stanford Libraries

Catalog

See 1 result

Physical and digital books, media, journals, archives, and databases.

Deep learning on point clouds for 3D scene understanding

Qi, Ruizhongtai

[Stanford, California] : [Stanford University], 2018.

Point cloud is a commonly used geometric data type with many applications in computer vision, computer graphics and robotics. The availability of inexpensive 3D sensors has made point cloud data widely available and the current interest in self-driving vehicles has highlighted the importance of reliable and efficient point cloud processing. Due to its irregular format, however, current convolutional deep learning methods cannot be directly used with point clouds. Most researchers transform such data to regular 3D voxel grids or collections of images, which renders data unnecessarily voluminous and causes quantization and other issues. In this thesis, we present novel types of neural networks (PointNet and PointNet++) that directly consume point clouds, in ways that respect the permutation invariance of points in the input. Our network provides a unified architecture for applications ranging from object classification and part segmentation to semantic scene parsing, while being efficient and robust against various input perturbations and data corruption. We provide a theoretical analysis of our approach, showing that our network can approximate any set function that is continuous, and explain its robustness. In PointNet++, we further exploit local contexts in point clouds, investigate the challenge of non-uniform sampling density in common 3D scans, and design new layers that learn to adapt to varying sampling densities. The proposed architectures have opened doors to new 3D-centric approaches to scene understanding. We show how we can adapt and apply PointNets to two important perception problems in robotics: 3D object detection and 3D scene flow estimation. In 3D object detection, we propose a new frustum-based detection framework that achieves 3D instance segmentation and 3D amodal box estimation in point clouds. Our model, called Frustum PointNets, benefits from accurate geometry provided by 3D points and is able to canonicalize the learning problem by applying both non-parametric and data-driven geometric transformations on the inputs. Evaluated on large-scale indoor and outdoor datasets, our real-time detector significantly advances state of the art. In scene flow estimation, we propose a new deep network called FlowNet3D that learns to recover 3D motion flow from two frames of point clouds. Compared with previous work that focuses on 2D representations and optimizes for optical flow, our model directly optimizes 3D scene flow and shows great advantages in evaluations on real LiDAR scans. As point clouds are prevalent, our architectures are not restricted to the above two applications or even 3D scene understanding. This thesis concludes with a discussion on other potential application domains and directions for future research.

Guides

Course- and topic-based guides to collections, tools, and services.

No guide results found... Try a different search

Library website

Library info; guides & content by subject specialists

No website results found... Try a different search

Exhibits

Digital showcases for research and teaching.

No exhibits results found... Try a different search

EarthWorks

Geospatial content, including GIS datasets, digitized maps, and census data.

No earthworks results found... Try a different search

More search tools

Tools to help you discover resources at Stanford and beyond.