Data-Driven Scene Reconstruction, Understanding and Generation
Restricted (Penn State Only)
- Author:
- Liu, Jiachen
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 11, 2024
- Committee Members:
- Dongwon Lee, Professor in Charge/Director of Graduate Studies
James Wang, Major Field Member
Sharon Huang, Chair & Dissertation Advisor
Fenglong Ma, Major Field Member
Huijuan Xu, Outside Unit & Field Member - Keywords:
- data-driven
3D scene reconstruction
plane reconstruction
generalizability
scene layout generation
floorplan - Abstract:
- This dissertation explores data-driven approaches for 3D scene reconstruction and understanding as well as generative artificial intelligence (AI) techniques for scene layout generation. Recovering 3D scene geometry from images and understanding the 3D space is a fundamental problem in computer vision (CV). Traditional methods typically involve hand-crafted feature detection, matching then triangulation pipelines with multi-view optimization and specified scene prior assumption to tackle 3D reconstruction. However, these classic 3D reconstruction pipeline pipelines face challenges with inputs such as sparse image overlap, textureless areas and repetitive patterns. In scene layout generation for computer-aided design and other applications, traditional rule-based approaches with complex optimization are adopted. However, rule-based generation is limited by the training data and struggles to produce diverse, high-quality samples. With the surge of machine learning (ML) techniques, especially deep learning (DL), data-driven methods based on deep neural networks, e.g., convolutional neural networks (CNNs) or Transformers, have been widely explored to address 3D reconstruction and generative AI problems, delivering remarkably accurate, efficient and diverse solutions. Despite the promising progress, data-driven methods still have notable limitations. In 3D scene reconstruction, structural primitives such as wireframes and planes are usually overlooked by most mainstream methods. These primitives offer important structural regularity for representing and reconstructing man-made scenes, particularly in non-textured regions. Additionally, current 3D reconstruction methods are usually trained and evaluated on the same or similarly distributed environments, which hampers their ability to generalize to unseen or out-of-distribution data. In the context of scene layout generation, this dissertation focuses on floorplan synthesis as a representative task. Previous methods typically use the same rasterized representation as in natural images to synthesize floorplan images. However, this approach is not optimal for preserving geometric coherence or attributes such as axis-aligned boundaries. Inspired by the benefits of data-driven methods and the aforementioned limitations of existing methods, in this dissertation, we first propose a learning-based framework PlaneMVS to reconstruct 3D planar structures of indoor scenes from multiple posed images. The method introduces a novel way of incorporating plane-based structural representation into 3D reconstruction pipelines in an end-to-end manner. We then try to tackle the limited generalizability on existing 3D plane reconstruction methods, by proposing two systems, MonoPlane and ZeroPlane, to enhance zero-shot transferability of plane reconstruction onto diverse datasets. On scene layout generation, we focus on the computer-aided floorplan design problem and propose an end-to-end graph-constrained generation framework with vectorized representation. We further extend the rectangular room layout into a more general vectorized polygon representation by designing a progressive, two-stage synthesis pipeline. By developing these data-driven frameworks for scene reconstruction and layout generation, we significantly enhance the precision and generalizability of current 3D reconstruction systems, as well as the fidelity and diversity of synthesized floorplan layouts. These advancements facilitate broad and valuable applications such as Augmented Reality (AR), robotics, and automated computer-aided design.