Research
My research centers on computer vision, graphics, and machine learning, including 3D and video generation, pedestrian and object recognition, and low-level vision.
I am particularly interested in 3D generation of objects, avatars, and scenes, with the ultimate goal of creating immersive and fantastic digital worlds.
Your browser does not support the video tag.
DreamCube: 3D Panorama Generation via Multi-plane Synchronization
Yukun Huang , Yanning Zhou , Jianan Wang , Kaiyi Huang , Xihui Liu
ICCV 2025
project page /
arXiv /
code /
model /
video
RGB-D cubemap generation using pre-trained 2D diffusion and multi-plane synchronized operators, with applications in panoramic depth estimation and 3D scene synthesis.
Your browser does not support the video tag.
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion
Yukun Huang , Jianan Wang , Ailing Zeng , Zheng-Jun Zha , Lei Zhang , Xihui Liu
TPAMI 2025
project page /
arXiv /
code /
model
Expressive full-body 3D avatar generation from 2D diffusion using hybrid 3D Gaussian avatar representation and skeleton-guided score distillation.
Your browser does not support the video tag.
DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation
Yunhan Yang, Shuo Chen, Yukun Huang , Xiaoyang Wu, Yuan-Chen Guo, Edmund Y. Lam, Hengshuang Zhao, Tong He, Xihui Liu
TPAMI 2025
arXiv
Integrating multi-view conditions into image and video diffusion models to generate controllable novel views for 3D object reconstruction.
Your browser does not support the video tag.
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
Kaiyi Huang, Yukun Huang , Xintao Wang, Zinan Lin, Xuefei Ning, Pengfei Wan, Di Zhang, Yu Wang, Xihui Liu
arXiv 2025
project page /
arXiv
FilMaster pioneers AI-driven filmmaking by automating the entire pipeline with cinematic principles.
Your browser does not support the video tag.
HoloPart: Generative 3D Part Amodal Segmentation
Yunhan Yang, Yuan-Chen Guo, Yukun Huang , Zi-Xin Zou, Zhipeng Yu, Yangguang Li, Yan-Pei Cao, Xihui Liu
arXiv 2025
project page /
arXiv /
code /
demo
Decomposing a 3D shape into complete, semantically meaningful parts.
Your browser does not support the video tag.
SAMPart3D: Segment Any Part in 3D Objects
Yunhan Yang, Yukun Huang , Yuan-Chen Guo, Liangjun Lu, Xiaoyang Wu, Edmund Y. Lam, Yan-Pei Cao, Xihui Liu
arXiv 2024
project page /
arXiv /
code /
dataset (PartObjaverse-Tiny)
Zero-shot, multi-granularity 3D part segmentation using vision foundation models to learn scalable, flexible 3D features without label sets.
Your browser does not support the video tag.
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Kaiyi Huang, Yukun Huang , Xuefei Ning, Zinan Lin, Yu Wang, Xihui Liu
arXiv 2024
project page /
arXiv /
code /
video
An iterative, self-correcting multi-agent collaborative framework for compositional text-to-video generation.
Your browser does not support the video tag.
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang , Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu
CVPR 2024
project page /
arXiv /
code /
model
Integrating multi-view conditions into pre-trained 2D diffusion models to generate controllable novel views for 3D object reconstruction.
Your browser does not support the video tag.
DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
Yukun Huang , Jianan Wang , Yukai Shi , Boshi Tang , Xianbiao Qi , Lei Zhang
ICLR 2024
arXiv /
paper
Analyzing the drawbacks of random timestep sampling in score distillation sampling (SDS) and proposing a non-increasing timestep sampling strategy.
Your browser does not support the video tag.
TOSS: High-quality Text-guided Novel View Synthesis from a Single Image
Yukai Shi, Jianan Wang, He Cao, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang , Shilong Liu, Lei Zhang, Heung-Yeung Shum
ICLR 2024
project page /
arXiv /
code /
model
Utilizing texts as semantic guidance to further constrain the solution space of NVS, and generates more plausible, controllable, multiview-consistent novel view images from a single image.
Your browser does not support the video tag.
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars
Yukun Huang , Jianan Wang , Ailing Zeng , He Cao , Xianbiao Qi , Yukai Shi , Zheng-Jun Zha , Lei Zhang
NeurIPS 2023
project page /
arXiv /
code /
poster /
gallery
High-quality animatable avatar generation from texts via 3D-consistent occlusion-aware score distillation sampling, ready for 3D scene composition with diverse interactions.
Professional Service
Reviewer : NeurIPS 2025; SIGGRAPH Asia 2025; ICCV 2025; ICLR 2025; CVPR 2025; ICML 2024; ECCV 2024; TPAMI; TIP; TMM; Neurocomputing; etc.
Teaching Assistant : Embodied AI 101 (Summer 2025), HKU; Computer Vision (Fall 2022), USTC.