About
- Xu Tan (谭旭) is a Research VP of Multimodality at Moonshot AI (a.k.a Kimi). He was previously a Principal Research Manager at Machine Learning Group, Microsoft Research Asia (MSRA). His work area covers LLMs, multimodality, and generative AI for video and audio.
- His has published influential research papers with 15000+ citations, with two best papers and several top cited papers at AI conferences.
- He designed several models/systems on video (e.g., Kimi-Video, LanDiff, GAIA), audio (e.g., Kimi-Audio, FastSpeech 1/2, NaturalSpeech 1/2/3, Muzic), language (e.g., MASS, MPNet), and AI agent (e.g., HuggingGPT).
- He has many technologies deployed in products: 1) Kimi-Video/Kimi-TTS in Kimi; 2) neural machine translation, pre-training models (MASS, MPNet), TTS (FastSpeech 1/2), ASR (FastCorrect 1/2), AI Music (https://github.com/microsoft/muzic), and AI avatar deployed in Microsoft (e.g., Bing Search/Ads, Microsoft Translator, Azure TTS, Azure ASR, Microsoft Xiaoice, etc).
- He and the team have several opensource projects on Github (with 30K+ stars), such as HuggingGPT/JARVIS, Kimi-Audio, MASS, MPNet, and Muzic.
- He is an Action Editor of Transactions on Machine Learning Research (TMLR), an Area Chair or Meta Reviewer of NeurIPS/ICML/AAAI/ICASSP, a senior member of IEEE, and a member of the standing committee on Computational Art in China Computer Federation (CCF).