HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

We propose HoloDreamer, a text-driven 3D scene generation framework to generate immersive and fully enclosed 3D scenes with high view-consistency. It consists of two basic modules: Stylized Equirectangular Panorama Generation, which generates a stylized high-quality equirectangular panorama from the input user prompt, and Enhanced Two-Stage Panorama Reconstruction, which employs 3D Gaussian Splatting for rapid 3D reconstruction of the panorama with enhanced integrity.

3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry. Owing to the powerful generative capabilities of text-to-image diffusion models that provide reliable priors, the creation of 3D scenes using only text prompts has become viable, thereby significantly advancing researches in text-driven 3D scene generation. In order to obtain multiple-view supervision from 2D diffusion models, prevailing methods typically employ the diffusion model to generate an initial local image, followed by iteratively outpainting the local image using diffusion models to gradually generate scenes. Nevertheless, these outpainting-based approaches prone to produce global inconsistent scene generation results without high degree of completeness, restricting their broader applications. To tackle these problems, we introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene, then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes. Specifically, we propose Stylized Equirectangular Panorama Generation, a pipeline that combines multiple diffusion models to enable stylized and detailed equirectangular panorama generation from complex text prompts. Subsequently, Enhanced Two-Stage Panorama Reconstruction is introduced, conducting a two-stage optimization of 3D-GS to inpaint the missing region and enhance the integrity of the scene. Comprehensive experiments demonstrated that our method outperforms prior works in terms of overall visual consistency and harmony as well as reconstruction quality and rendering robustness when generating fully enclosed scenes.

Overview of our Stylized Equirectangular Panorama Generation. Given a user prompt, multiple diffusion models are used to generate stylized high-quality panoramas. Additionally the circular blending technique is applied to avoid cracks when rotating the panorama.

Overview of our Enhanced Two-Stage Panorama Reconstruction. We perform depth estimation on the generated panorama and then project RGBD data to obtain the point cloud. Two types of cameras --- base cameras and supplementary cameras --- for projection and rendering in different scenarios, and prepare three image sets for supervision at different stages of 3D-GS optimization. The rendering images of the reconstructed scene from Pre Optimization stage are inpainted for optimization in Transfer Optimization stage, resulting in the final reconstructed scene.

BibTeX

@misc{zhou2024holodreamerholistic3dpanoramic,
      title={HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions}, 
      author={Haiyang Zhou and Xinhua Cheng and Wangbo Yu and Yonghong Tian and Li Yuan},
      year={2024},
      eprint={2407.15187},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.15187}, 
}

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

Results

Lego city with lego shops, lego road with street lamp, cars and lego mans on the street, lego trees and lake at a park.

A mountain town in anime style with blooming cherry blossoms, quaint streets.

A dense tropical rainforest with towering trees, exotic birds, and waterfalls.

Venezia in oil painting style.

A family is playing with their dogs in the yard, cartoon style.

A classical library filled with ancient books and scrolls, oak bookcases, and vintage chandeliers.

A futuristic shopping center with sleek flooring, interactive screens, and trendy shops.

Spacious and bright modern living room with floor-to-ceiling windows and minimalist furniture.

Grocery store with block tiles, wooden hard rooftop, various fruits upon the wooden tables, artificial trees on the tables.

A high-tech sci-fi laboratory with test tubes, microscopes, and futuristic scientific equipment.

Abstract

Method

BibTeX