DeepSeek published a new paper on New Year's Day proposing a new architecture called MHC. The study aims to solve the instability problems of traditional hyperconnections in large-scale model training while maintaining their significant performance gains. There are three first authors of this paper: Zhenda Xie, Yixuan Wei, and Huanqi Cao. Notably, DeepSeek founder & CEO Liang Wenfeng is also on the list of authors.

Zhitongcaijing · 01/01 08:49
DeepSeek published a new paper on New Year's Day proposing a new architecture called MHC. The study aims to solve the instability problems of traditional hyperconnections in large-scale model training while maintaining their significant performance gains. There are three first authors of this paper: Zhenda Xie, Yixuan Wei, and Huanqi Cao. Notably, DeepSeek founder & CEO Liang Wenfeng is also on the list of authors.