ChatPaper.aiChatPaper

LSNet:观全局,察细微

LSNet: See Large, Focus Small

March 29, 2025
作者: Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding
cs.AI

摘要

视觉网络设计,包括卷积神经网络和视觉Transformer,已显著推动了计算机视觉领域的发展。然而,其复杂的计算为实际部署,尤其是在实时应用中,带来了挑战。为解决这一问题,研究者们探索了多种轻量级且高效的网络设计方案。但现有的轻量模型主要依赖自注意力机制和卷积进行令牌混合,这种依赖在轻量网络的感知与聚合过程中限制了其效果与效率,难以在有限计算预算下平衡性能与效率。本文从人类视觉系统固有的动态多尺度视觉能力中汲取灵感,提出了一种“见大聚焦小”的轻量视觉网络设计策略。我们引入了LS(大-小)卷积,它结合了大核感知与小核聚合,能高效捕捉广泛的感知信息并实现动态复杂视觉表征的精确特征聚合,从而熟练处理视觉信息。基于LS卷积,我们提出了LSNet,一个全新的轻量模型家族。大量实验表明,LSNet在多种视觉任务中均超越了现有轻量网络,展现出卓越的性能与效率。代码与模型已发布于https://github.com/jameslahm/lsnet。
English
Vision network designs, including Convolutional Neural Networks and Vision Transformers, have significantly advanced the field of computer vision. Yet, their complex computations pose challenges for practical deployments, particularly in real-time applications. To tackle this issue, researchers have explored various lightweight and efficient network designs. However, existing lightweight models predominantly leverage self-attention mechanisms and convolutions for token mixing. This dependence brings limitations in effectiveness and efficiency in the perception and aggregation processes of lightweight networks, hindering the balance between performance and efficiency under limited computational budgets. In this paper, we draw inspiration from the dynamic heteroscale vision ability inherent in the efficient human vision system and propose a ``See Large, Focus Small'' strategy for lightweight vision network design. We introduce LS (Large-Small) convolution, which combines large-kernel perception and small-kernel aggregation. It can efficiently capture a wide range of perceptual information and achieve precise feature aggregation for dynamic and complex visual representations, thus enabling proficient processing of visual information. Based on LS convolution, we present LSNet, a new family of lightweight models. Extensive experiments demonstrate that LSNet achieves superior performance and efficiency over existing lightweight networks in various vision tasks. Codes and models are available at https://github.com/jameslahm/lsnet.

Summary

AI-Generated Summary

PDF93April 3, 2025