About

Xu Tan (谭旭) is a Research VP of Multimodality at Moonshot AI (a.k.a Kimi). He was previously a Principal Research Manager at Machine Learning Group, Microsoft Research Asia (MSRA). His work area covers LLMs, multimodality, and generative AI for video and audio.
His has published influential research papers with 15000+ citations, with two best papers and several top cited papers at AI conferences.
He designed several models/systems on video (e.g., Kimi-Video, LanDiff, GAIA), audio (e.g., Kimi-Audio, FastSpeech 1/2, NaturalSpeech 1/2/3, Muzic), language (e.g., MASS, MPNet), and AI agent (e.g., HuggingGPT).
He has many technologies deployed in products: 1) Kimi-Video/Kimi-TTS in Kimi; 2) neural machine translation, pre-training models (MASS, MPNet), TTS (FastSpeech 1/2), ASR (FastCorrect 1/2), AI Music (https://github.com/microsoft/muzic), and AI avatar deployed in Microsoft (e.g., Bing Search/Ads, Microsoft Translator, Azure TTS, Azure ASR, Microsoft Xiaoice, etc).
He and the team have several opensource projects on Github (with 30K+ stars), such as HuggingGPT/JARVIS, Kimi-Audio, MASS, MPNet, and Muzic.
He is an Action Editor of Transactions on Machine Learning Research (TMLR), an Area Chair or Meta Reviewer of NeurIPS/ICML/AAAI/ICASSP, a senior member of IEEE, and a member of the standing committee on Computational Art in China Computer Federation (CCF).