Research
My research centers on computer vision, graphics, and machine learning, including 3D and video generation, pedestrian and object recognition, and low-level vision.
I am particularly interested in 3D generation of objects, avatars, and scenes, with the ultimate goal of creating immersive and fantastic digital worlds.
Your browser does not support the video tag.
OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes
Yukun Huang , Jiwen Yu , Yanning Zhou , Jianan Wang , Xintao Wang , Pengfei Wan , Xihui Liu
arXiv 2025
project page /
arXiv /
code /
model /
dataset
Repurposing pre-trained 2D flow matching model for panorama generation, perception, and completion, enabling graphics-ready 3D scene creation.
Your browser does not support the video tag.
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Kaiyi Huang, Yukun Huang , Xuefei Ning, Zinan Lin, Yu Wang, Xihui Liu
AAAI 2026
project page /
arXiv /
code /
video
An iterative, self-correcting multi-agent collaborative framework for compositional text-to-video generation.
Your browser does not support the video tag.
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion
Yunhan Yang, Yufan Zhou, Yuan-Chen Guo, Zi-Xin Zou, Yukun Huang , Ying-Tian Liu, Hao Xu, Ding Liang, Yan-Pei Cao, Xihui Liu
SIGGRAPH Asia 2025
project page /
arXiv /
code /
model /
demo
Part-aware 3D object generation framework designed to achieve high semantic decoupling among components while maintaining robust structural cohesion.
Your browser does not support the video tag.
The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control
Ruili Feng, Han Zhang, Zhantao Yang, Jie Xiao, Zhilei Shu, Zhiheng Liu, Andy Zheng, Yukun Huang , Yu Liu, Hongyang Zhang
NeurIPS 2025
project page /
arXiv
Foundational realistic world simulator capable of generating infinitely long 720p high-fidelity real-scene video streams with real-time, responsive control.
Your browser does not support the video tag.
DreamCube: 3D Panorama Generation via Multi-plane Synchronization
Yukun Huang , Yanning Zhou , Jianan Wang , Kaiyi Huang , Xihui Liu
ICCV 2025
project page /
arXiv /
code /
model /
video
RGB-D cubemap generation using pre-trained 2D diffusion and multi-plane synchronized operators, with applications in panoramic depth estimation and 3D scene synthesis.
Your browser does not support the video tag.
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion
Yukun Huang , Jianan Wang , Ailing Zeng , Zheng-Jun Zha , Lei Zhang , Xihui Liu
TPAMI 2025
project page /
arXiv /
code /
model
Expressive full-body 3D avatar generation from 2D diffusion using hybrid 3D Gaussian avatar representation and skeleton-guided score distillation.
Your browser does not support the video tag.
DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation
Yunhan Yang, Shuo Chen, Yukun Huang , Xiaoyang Wu, Yuan-Chen Guo, Edmund Y. Lam, Hengshuang Zhao, Tong He, Xihui Liu
TPAMI 2025
arXiv
Integrating multi-view conditions into image and video diffusion models to generate controllable novel views for 3D object reconstruction.
Your browser does not support the video tag.
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
Kaiyi Huang, Yukun Huang , Xintao Wang, Zinan Lin, Xuefei Ning, Pengfei Wan, Di Zhang, Yu Wang, Xihui Liu
arXiv 2025
project page /
arXiv
FilMaster pioneers AI-driven filmmaking by automating the entire pipeline with cinematic principles.
Your browser does not support the video tag.
HoloPart: Generative 3D Part Amodal Segmentation
Yunhan Yang, Yuan-Chen Guo, Yukun Huang , Zi-Xin Zou, Zhipeng Yu, Yangguang Li, Yan-Pei Cao, Xihui Liu
arXiv 2025
project page /
arXiv /
code /
demo
Decomposing a 3D shape into complete, semantically meaningful parts.
Your browser does not support the video tag.
SAMPart3D: Segment Any Part in 3D Objects
Yunhan Yang, Yukun Huang , Yuan-Chen Guo, Liangjun Lu, Xiaoyang Wu, Edmund Y. Lam, Yan-Pei Cao, Xihui Liu
arXiv 2024
project page /
arXiv /
code /
dataset (PartObjaverse-Tiny)
Zero-shot, multi-granularity 3D part segmentation using vision foundation models to learn scalable, flexible 3D features without label sets.
Your browser does not support the video tag.
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang , Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu
CVPR 2024
project page /
arXiv /
code /
model
Integrating multi-view conditions into pre-trained 2D diffusion models to generate controllable novel views for 3D object reconstruction.
Your browser does not support the video tag.
DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
Yukun Huang , Jianan Wang , Yukai Shi , Boshi Tang , Xianbiao Qi , Lei Zhang
ICLR 2024
arXiv /
paper
Analyzing the drawbacks of random timestep sampling in score distillation sampling (SDS) and proposing a non-increasing timestep sampling strategy.
Your browser does not support the video tag.
TOSS: High-quality Text-guided Novel View Synthesis from a Single Image
Yukai Shi, Jianan Wang, He Cao, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang , Shilong Liu, Lei Zhang, Heung-Yeung Shum
ICLR 2024
project page /
arXiv /
code /
model
Utilizing texts as semantic guidance to further constrain the solution space of NVS, and generates more plausible, controllable, multiview-consistent novel view images from a single image.
Your browser does not support the video tag.
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars
Yukun Huang , Jianan Wang , Ailing Zeng , He Cao , Xianbiao Qi , Yukai Shi , Zheng-Jun Zha , Lei Zhang
NeurIPS 2023
project page /
arXiv /
code /
poster /
gallery
High-quality animatable avatar generation from texts via 3D-consistent occlusion-aware score distillation sampling, ready for 3D scene composition with diverse interactions.
Professional Service
Reviewer : CVPR / ICCV / ECCV / ICLR / NeurIPS / ICML / SIGGRAPH Asia / TPAMI / TIP / TMM / Neurocomputing / ...
Teaching Assistant : Embodied AI 101 (Summer 2025), HKU; Computer Vision (Fall 2022), USTC