iFLYTEK released news that iFLYTEK and Huawei recently took the lead in implementing large-scale cross-node expert parallel cluster inference with domestic computing power. According to its introduction, through distributed architecture innovation and collaborative algorithm optimization, single-card static memory usage was reduced to 1/4 of dual-machine deployment, efficiency increased by 75%, expert computing density increased by 4 times, inference throughput increased 3.2 times, and end-to-end latency was reduced by 50%. This solution will also be applied to accelerate the training of iFLY Spark's deep inference model. It is expected that the inference efficiency will increase by 200% during training.

Zhitongcaijing · 03/11 11:41
iFLYTEK released news that iFLYTEK and Huawei recently took the lead in implementing large-scale cross-node expert parallel cluster inference with domestic computing power. According to its introduction, through distributed architecture innovation and collaborative algorithm optimization, single-card static memory usage was reduced to 1/4 of dual-machine deployment, efficiency increased by 75%, expert computing density increased by 4 times, inference throughput increased 3.2 times, and end-to-end latency was reduced by 50%. This solution will also be applied to accelerate the training of iFLY Spark's deep inference model. It is expected that the inference efficiency will increase by 200% during training.