Yukun Huang (黄宇坤)

I'm currently a postdoctoral fellow at the Institute of Data Science (IDS), University of Hong Kong, advised by Prof. Xihui Liu and Prof. Yi Ma.

Before that, I obtained my Ph.D. from University of Science and Technology of China (中国科学技术大学) in 2023, supervised by Prof. Xueyang Fu and Prof. Zheng-Jun Zha. I also gained valuable experience working at International Digital Economy Academy (IDEA), Tencent ARC Lab, and Tencent Youtu Lab.

Email  /  Scholar  /  GitHub  /  X (Twitter)

Research

My research centers on computer vision, graphics, and machine learning, including 3D and video generation, pedestrian and object recognition, and low-level vision.

I am particularly interested in 3D generation of objects, avatars, and scenes, with the ultimate goal of creating immersive and fantastic digital worlds.

Recent Publication

DreamCube: 3D Panorama Generation via Multi-plane Synchronization
Yukun Huang, Yanning Zhou, Jianan Wang, Kaiyi Huang, Xihui Liu
ICCV 2025
project page / arXiv / code / model / video

RGB-D cubemap generation using pre-trained 2D diffusion and multi-plane synchronized operators, with applications in panoramic depth estimation and 3D scene synthesis.

DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion
Yukun Huang, Jianan Wang, Ailing Zeng, Zheng-Jun Zha, Lei Zhang, Xihui Liu
TPAMI 2025
project page / arXiv / code / model

Expressive full-body 3D avatar generation from 2D diffusion using hybrid 3D Gaussian avatar representation and skeleton-guided score distillation.

DreamComposer++: Empowering Diffusion Models with Multi-View Conditions for 3D Content Generation
Yunhan Yang, Shuo Chen, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Edmund Y. Lam, Hengshuang Zhao, Tong He, Xihui Liu
TPAMI 2025
arXiv

Integrating multi-view conditions into image and video diffusion models to generate controllable novel views for 3D object reconstruction.

FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
Kaiyi Huang, Yukun Huang, Xintao Wang, Zinan Lin, Xuefei Ning, Pengfei Wan, Di Zhang, Yu Wang, Xihui Liu
arXiv 2025
project page / arXiv

FilMaster pioneers AI-driven filmmaking by automating the entire pipeline with cinematic principles.

HoloPart: Generative 3D Part Amodal Segmentation
Yunhan Yang, Yuan-Chen Guo, Yukun Huang, Zi-Xin Zou, Zhipeng Yu, Yangguang Li, Yan-Pei Cao, Xihui Liu
arXiv 2025
project page / arXiv / code / demo

Decomposing a 3D shape into complete, semantically meaningful parts.

SAMPart3D: Segment Any Part in 3D Objects
Yunhan Yang, Yukun Huang, Yuan-Chen Guo, Liangjun Lu, Xiaoyang Wu, Edmund Y. Lam, Yan-Pei Cao, Xihui Liu
arXiv 2024
project page / arXiv / code / dataset (PartObjaverse-Tiny)

Zero-shot, multi-granularity 3D part segmentation using vision foundation models to learn scalable, flexible 3D features without label sets.

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
Kaiyi Huang, Yukun Huang, Xuefei Ning, Zinan Lin, Yu Wang, Xihui Liu
arXiv 2024
project page / arXiv / code / video

An iterative, self-correcting multi-agent collaborative framework for compositional text-to-video generation.

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu
CVPR 2024
project page / arXiv / code / model

Integrating multi-view conditions into pre-trained 2D diffusion models to generate controllable novel views for 3D object reconstruction.

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
Yukun Huang, Jianan Wang, Yukai Shi, Boshi Tang, Xianbiao Qi, Lei Zhang
ICLR 2024
arXiv / paper

Analyzing the drawbacks of random timestep sampling in score distillation sampling (SDS) and proposing a non-increasing timestep sampling strategy.

TOSS: High-quality Text-guided Novel View Synthesis from a Single Image
Yukai Shi, Jianan Wang, He Cao, Boshi Tang, Xianbiao Qi, Tianyu Yang, Yukun Huang, Shilong Liu, Lei Zhang, Heung-Yeung Shum
ICLR 2024
project page / arXiv / code / model

Utilizing texts as semantic guidance to further constrain the solution space of NVS, and generates more plausible, controllable, multiview-consistent novel view images from a single image.

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars
Yukun Huang, Jianan Wang, Ailing Zeng, He Cao, Xianbiao Qi, Yukai Shi, Zheng-Jun Zha, Lei Zhang
NeurIPS 2023
project page / arXiv / code / poster / gallery

High-quality animatable avatar generation from texts via 3D-consistent occlusion-aware score distillation sampling, ready for 3D scene composition with diverse interactions.

Professional Service
  • Reviewer: NeurIPS 2025; SIGGRAPH Asia 2025; ICCV 2025; ICLR 2025; CVPR 2025; ICML 2024; ECCV 2024; TPAMI; TIP; TMM; Neurocomputing; etc.
  • Teaching Assistant: Embodied AI 101 (Summer 2025), HKU; Computer Vision (Fall 2022), USTC.

Website template from Jon Barron's website.