Hao Yang | 杨昊

I joined Moonshot.ai in late 2023. I am in charge of multi-modal data infrastructure in Moonshot.ai.

Before joining Moonshot.ai, I was a Research Scientist at the ByteDance ICVG Group during 2022-2023. Even before that, I was a Senior Researcher from the Visual Computing Group of Microsoft Research Asia since late 2017, right after receiving both my B.S and PhD degrees from the School of Software, Tsinghua University.

From 0 to 1, I personally designed and implemented the storage formats, data management system, data processing pipelines, dataloaders, reviewing services, and even the Moonshot.ai chat templates of images and videos, for both pre-training and post-training (including reinforcement learning). All these (called as "m3data" and "m3data-zoo" in Moonshot.ai) are developed in-house, by myself.

I also designed and implemented the unified image and video pre-processor (called as "media-proc" and "mecord"), all by myself, which now becomes a fundamental component of the training and inference infrastructure of Moonshot.ai VLM models.

Besides, I designed and implemented an infrastructure for visual agents (called as "m3env"), including the image search and python tools, workflows for visual agent data rollout and reinforcement learning, supporting dynamic toolkit installation and uninstallation, and more.

LLMs and VLMs

Kimi Team Kimi k1.5: Scaling Reinforcement Learning with LLMs. Technical Report. [paper]

reinforcement learning llms vlms
Kimi Team KimiVL Technical Report. Technical Report. [paper]

vlms llms reinforcement learning

3D rendering and avatars

Huichao Zhang, Bowen Chen, Hao Yang, Liao Qu, Xu Wang, Li Chen, Chao Long, Feida Zhu, Kang Du, Min Zheng, AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose. Technical Report. [paper][project]

zero-shot 3d avatar generation diffusion models 3d rendering nerf
Shizun Wang, Weihong Zeng, Xu Wang, Hao Yang, Li Chen, Yi Yuan, Yunzhao Zeng, Min Zheng, Chuang Zhang, Ming Wu, SwiftAvatar: Efficient Auto-Creation of Parameterized Stylized Character on Arbitrary Avatar Engines. AAAI 2023 (oral). [paper]

face attributes avatar auto-creation adversarial training
Hao Ouyang, Bo Zhang, Pan Zhang, Hao Yang, Jiaolong Yang, Dong Chen, Qifeng Chen, Fang Wen, Real-Time Neural Character Rendering with Pose-Guided Multiplane Images. European Conference on Computer Vision 2022. [paper][project]

3d rendering dynamic scene multi-plane images
Chulin Xie, Chuxin Wang, Bo Zhang, Hao Yang, Dong Chen, Fang Wen, Style-based Point Generator with Adversarial Rendering for Point Cloud Completion. Computer Vision and Pattern Recognition 2021. [paper] [code]

3d point cloud point rendering point completion adversarial training

Pretrainings for applications

Xiaoyi Dong, Yinglin Zheng, Jianmin Bao, Ting Zhang, Dongdong Chen, Hao Yang, Ming Zeng, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu, MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining. Computer Vision and Pattern Recognition 2023. [paper]

multi-modality pre-training
Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen, General Facial Representation Learning in a Visual-Linguistic Manner. Computer Vision and Pattern Recognition 2022 (oral). [paper][code]

transformer multi-modality pre-training face alignment face parsing face attributes
Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen, Large Scale Pre-training for Person Re-identification with Noisy Labels. Computer Vision and Pattern Recognition 2022.

weakly-supervised learning person re-id pre-training
Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen, Unsupervised Pre-training for Person Re-identification. Computer Vision and Pattern Recognition 2021. [paper]

self-supervised learning person re-id pre-training

Face forgery and face forensics

Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, Fang Wen, FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping. Computer Vision and Pattern Recognition 2020 (oral). [paper]

face synthesis face swapping adversarial training deepfake
Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, Baining Guo, Face X-ray for More General Face Forgery Detection. Computer Vision and Pattern Recognition 2020 (oral). [paper]

face forensics deepfake detection

Face parsing, face landmarks and face editing

Jinpeng Lin, Hao Yang, Dong Chen, Ming Zeng, Fang Wen, Lu Yuan, Face Parsing with RoI Tanh-warping. Computer Vision and Pattern Recognition 2019. [paper] [data]

face parsing segmentation
Shuyang Gu, Jianmin Bao, Hao Yang, Dong Chen, Fang Wen, Lu Yuan, Mask-Guided Portrait Editing with Conditional GANs. Computer Vision and Pattern Recognition 2019. [paper]

face synthesis adversarial training face parsing
Yangyu Huang, Xi Chen, Jongyoo Kim, Hao Yang, Chong Li, Jiaolong Yang, Dong Chen, FreeEnricher: Enriching Face Landmarks without Additional Cost. AAAI 2023. [paper] [code]

face alignment
Yangyu Huang, Hao Yang, Chong Li, Jongyoo Kim, Fangyun Wei, ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment. International Conference on Computer Vision 2021. [paper]

face alignment

Single-view 3D reconstruction

Hao Yang and Hui Zhang, Automatic 3D reconstruction of a polyhedral object from a single line drawing under perspective projection. Computers & Graphics 65 (2017): 45-59. [paper]

3d reconstruction single view line drawing
Hao Yang and Hui Zhang, Efficient 3D Room Shape Recovery from a Single Panorama. Computer Vision and Pattern Recognition 2016 (full oral). [paper]

3d reconstruction single view panorama
Yunfeng Liang, Hao Yang, and Hui Zhang. A Per-pixel Noise Detection Approach for Example-Based Photometric Stereo. Computers & Graphics 46 (2015): 327-335. [paper]

3d reconstruction photometric stereo