I joined Moonshot.ai in late 2023. I am in charge of multi-modal data infrastructure in Moonshot.ai.
Before joining Moonshot.ai, I was a Research Scientist at the ByteDance ICVG Group during 2022-2023. Even before that, I was a Senior Researcher from the Visual Computing Group of Microsoft Research Asia since late 2017, right after receiving both my B.S and PhD degrees from the School of Software, Tsinghua University.
From 0 to 1, I personally designed and implemented the storage formats, data management system, data processing pipelines, dataloaders, reviewing services, and even the Moonshot.ai chat templates of images and videos, for both pre-training and post-training (including reinforcement learning). All these (called as "m3data" and "m3data-zoo" in Moonshot.ai) are developed in-house, by myself.
I also designed and implemented the unified image and video pre-processor (called as "media-proc" and "mecord"), all by myself, which now becomes a fundamental component of the training and inference infrastructure of Moonshot.ai VLM models.
Besides, I designed and implemented an infrastructure for visual agents (called as "m3env"), including the image search and python tools, workflows for visual agent data rollout and reinforcement learning, supporting dynamic toolkit installation and uninstallation, and more.