ChatPaper.aiChatPaper

超連結

Hyper-Connections

September 29, 2024
作者: Defa Zhu, Hongzhi Huang, Zihao Huang, Yutao Zeng, Yunyao Mao, Banggu Wu, Qiyang Min, Xun Zhou
cs.AI

摘要

我們提出了「超連結」,這是一種簡單而有效的方法,可作為替代殘差連結的選擇。這種方法專門解決了殘差連結變體中常見的缺點,例如梯度消失和表示崩潰之間的搖擺效應。從理論上講,超連結允許網絡調整不同深度特徵之間連結的強度,並動態重新排列層。我們進行了重點放在大型語言模型的預訓練上的實驗,包括密集和稀疏模型,在這些實驗中,超連結相較於殘差連結表現出顯著的性能改進。在視覺任務上進行的額外實驗也展示了類似的改進。我們預期這種方法將廣泛應用並在各種人工智慧問題上帶來益處。
English
We present hyper-connections, a simple yet effective method that can serve as an alternative to residual connections. This approach specifically addresses common drawbacks observed in residual connection variants, such as the seesaw effect between gradient vanishing and representation collapse. Theoretically, hyper-connections allow the network to adjust the strength of connections between features at different depths and dynamically rearrange layers. We conduct experiments focusing on the pre-training of large language models, including dense and sparse models, where hyper-connections show significant performance improvements over residual connections. Additional experiments conducted on vision tasks also demonstrate similar improvements. We anticipate that this method will be broadly applicable and beneficial across a wide range of AI problems.

Summary

AI-Generated Summary

PDF234November 13, 2024