We demonstrate a non-training strategy termed Multi-plane Synchronization, which enables pre-trained 2D diffusion models to generate seam-continuous multi-plane panoramic representations like cube maps and sky boxes. This strategy works by extending the model's 2D spatial operators to the omnidirectional image domain, making them omnidirectionally translation-equivalent.
Based on this strategy, we further present DreamCube, a diffusion model for RGB-D cube map generation from single-view input. Benefiting from cube map representations and synchronization strategies, DreamCube maximizes the reuse of 2D diffusion model weights, achieving high-quality appearance and accurate geometry generation. Extensive experiments the effectiveness of our approach in panoramic image generation, panoramic depth estimation, and 3D scene generation.
Synchronizing different spatial operators (attentions, 2d convs, group norms) of diffusion U-Net and VAE enables seam-continuous cube map generation without fine-tuning.
Training and inference framework of DreamCube for RGB-D cube map generation:
Results of Multi-plane Synchronization on existing pre-trained 2D diffusion models, including SD2, SDXL, and Marigold:
Out-of-domain RGB-D panorama generation from single view inputs:
Free angle observation. Try using the mouse to drag and rotate the view (Left: RGB, Right: Euclidean Depth).
Panoramic overview (Left: RGB, Right: Euclidean Depth).
@article{huang2025dreamcube,
title={{DreamCube: RGB-D Panorama Generation via Multi-plane Synchronization}},
author={Huang, Yukun and Zhou, Yanning and Wang, Jianan and Huang, Kaiyi and Liu, Xihui},
year={2025},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.CV},
}