README_EN.md · openpangu/openPangu-2.0-Flash at main

openPangu-2.0-Flash is an MoE model trained on Ascend. The model has 92B total parameters and 6B activated parameters. Its context length is 512k. The total pretraining data contains 34T tokens. During Post-training, openPangu-2.0-Flash is trained through unified SFT with slow and fast thinking capability, multiple specialist RL traning, on-policy distillation combining multiple RL specialists.

- Efficient attention: The model retains MLA for efficient inference and combines DSA and SWA in a 1:2 layer ratio. SWA layers handle local-window modeling, while DSA layers capture sparse global context. This design lowers compute, memory footprint, and memory access costs for long-context inference while preserving accuracy.

- Residual topology: The conventional residual path is replaced with a 4-stream mHC design, improving representation diversity and generalization.

- Multi-token prediction (MTP): The model uses three MTP heads to draft 3 additional tokens per step, enabling faster inference through self-speculative decoding.

原始关键词#openpangu#readme#flash#main#at#en

查看原文reddit.com

单一来源，暂无交叉验证