AI研究论文每日精选
每日精选AI研究论文及翻译
VideoGrain:时空注意力调制实现多粒度视频编辑VideoGrain: Modulating Space-Time Attention for Multi-grained Video
Editing
VideoGrain:时空注意力调制实现多粒度视频编辑
VideoGrain: Modulating Space-Time Attention for Multi-grained Video
Editing
Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang•Feb 24, 2025•734
长上下文大语言模型如是说Thus Spake Long-Context Large Language Model
长上下文大语言模型如是说
Thus Spake Long-Context Large Language Model
Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu•Feb 24, 2025•686
一日单GPU训练:语音语言模型的快速构建Slamming: Training a Speech Language Model on One GPU in a Day
一日单GPU训练:语音语言模型的快速构建
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon, Avishai Elmakies, Yossi Adi•Feb 19, 2025•662
DICEPTION:面向视觉感知任务的通用扩散模型DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
DICEPTION:面向视觉感知任务的通用扩散模型
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
Canyu Zhao, Mingyu Liu, Huanyi Zheng, Muzhi Zhu, Zhiyue Zhao, Hao Chen, Tong He, Chunhua Shen•Feb 24, 2025•513
Audio-FLAN:初步发布版Audio-FLAN: A Preliminary Release
Audio-FLAN:初步发布版
Audio-FLAN: A Preliminary Release
Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Yinghao Ma, Sitong Cheng, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, Xinshen Zhang, Tianchi Liu, Ruibin Yuan, Zeyue Tian, Haohe Liu, Emmanouil Benetos, Ge Zhang, Yike Guo, Wei Xue•Feb 23, 2025•342
GCC:基于色彩校验卡扩散的生成式色彩恒常性GCC: Generative Color Constancy via Diffusing a Color Checker
GCC:基于色彩校验卡扩散的生成式色彩恒常性
GCC: Generative Color Constancy via Diffusing a Color Checker
Chen-Wei Chang, Cheng-De Fan, Chia-Che Chang, Yi-Chen Lo, Yu-Chee Tseng, Jiun-Long Huang, Yu-Lun Liu•Feb 24, 2025•272
让LoRA再创辉煌:通过自适应奇异值与专家混合优化对齐提升LoRA性能Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and
Mixture-of-Experts Optimization Alignment
让LoRA再创辉煌:通过自适应奇异值与专家混合优化对齐提升LoRA性能
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and
Mixture-of-Experts Optimization Alignment
Chenghao Fan, Zhenyi Lu, Sichen Liu, Xiaoye Qu, Wei Wei, Chengfeng Gu, Yu Cheng•Feb 24, 2025•274
数学推理中测试时缩放的语言泛化能力Linguistic Generalizability of Test-Time Scaling in Mathematical
Reasoning
数学推理中测试时缩放的语言泛化能力
Linguistic Generalizability of Test-Time Scaling in Mathematical
Reasoning
Guijin Son, Jiwoo Hong, Hyunwoo Ko, James Thorne•Feb 24, 2025•242
CodeCriticBench:面向大型语言模型的综合性代码评审基准CodeCriticBench: A Holistic Code Critique Benchmark for Large Language
Models
CodeCriticBench:面向大型语言模型的综合性代码评审基准
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language
Models
Alexander Zhang, Marcus Dong, Jiaheng Liu, Wei Zhang, Yejie Wang, Jian Yang, Ge Zhang, Tianyu Liu, Zhongyuan Peng, Yingshui Tan, Yuanxing Zhang, Zhexu Wang, Weixun Wang, Yancheng He, Ken Deng, Wangchunshu Zhou, Wenhao Huang, Zhaoxiang Zhang•Feb 23, 2025•243
RIFLEx:视频扩散Transformer中长度外推的免费午餐RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion
Transformers
RIFLEx:视频扩散Transformer中长度外推的免费午餐
RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion
Transformers
Min Zhao, Guande He, Yixiao Chen, Hongzhou Zhu, Chongxuan Li, Jun Zhu•Feb 21, 2025•203
Stable-SPAM:如何在4比特精度下比16比特Adam更稳定地进行训练Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Stable-SPAM:如何在4比特精度下比16比特Adam更稳定地进行训练
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang, Haotian Hu, Zhenyu Zhang, Gaojie Jin, Xiang Li, Li Shen, Tianlong Chen, Lu Liu, Qingsong Wen, Zhangyang Wang, Shiwei Liu•Feb 24, 2025•162
多模态不一致性推理(MMIR):多模态推理模型的新基准Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for
Multimodal Reasoning Models
多模态不一致性推理(MMIR):多模态推理模型的新基准
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for
Multimodal Reasoning Models
Qianqi Yan, Yue Fan, Hongquan Li, Shan Jiang, Yang Zhao, Xinze Guan, Ching-Chen Kuo, Xin Eric Wang•Feb 22, 2025•162
超越发布:生成式AI系统的访问考量Beyond Release: Access Considerations for Generative AI Systems
超越发布:生成式AI系统的访问考量
Beyond Release: Access Considerations for Generative AI Systems
Irene Solaiman, Rishi Bommasani, Dan Hendrycks, Ariel Herbert-Voss, Yacine Jernite, Aviya Skowron, Andrew Trask•Feb 23, 2025•122
X-Dancer:从富有表现力的音乐到人类舞蹈视频的生成X-Dancer: Expressive Music to Human Dance Video Generation
X-Dancer:从富有表现力的音乐到人类舞蹈视频的生成
X-Dancer: Expressive Music to Human Dance Video Generation
Zeyuan Chen, Hongyi Xu, Guoxian Song, You Xie, Chenxu Zhang, Xin Chen, Chao Wang, Di Chang, Linjie Luo•Feb 24, 2025•113
Mobile-Agent-V:通过视频引导的多智能体协作学习移动设备操作Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided
Multi-Agent Collaboration
Mobile-Agent-V:通过视频引导的多智能体协作学习移动设备操作
Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided
Multi-Agent Collaboration
Junyang Wang, Haiyang Xu, Xi Zhang, Ming Yan, Ji Zhang, Fei Huang, Jitao Sang•Feb 24, 2025•112
反思式规划:面向多阶段长时程机器人操作的视觉-语言模型Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon
Robotic Manipulation
反思式规划:面向多阶段长时程机器人操作的视觉-语言模型
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon
Robotic Manipulation
Yunhai Feng, Jiaming Han, Zhuoran Yang, Xiangyu Yue, Sergey Levine, Jianlan Luo•Feb 23, 2025•112
基于场景的说服性语言生成在自动化营销中的应用Grounded Persuasive Language Generation for Automated Marketing
基于场景的说服性语言生成在自动化营销中的应用
Grounded Persuasive Language Generation for Automated Marketing
Jibang Wu, Chenghao Yang, Simon Mahns, Chaoqi Wang, Hao Zhu, Fei Fang, Haifeng Xu•Feb 24, 2025•103
预测Hugging Face平台上开源AI模型的增长趋势Forecasting Open-Weight AI Model Growth on Hugging Face
预测Hugging Face平台上开源AI模型的增长趋势
Forecasting Open-Weight AI Model Growth on Hugging Face
Kushal Raj Bhandari, Pin-Yu Chen, Jianxi Gao•Feb 21, 2025•103
TAG:一种去中心化的多智能体分层强化学习框架TAG: A Decentralized Framework for Multi-Agent Hierarchical
Reinforcement Learning
TAG:一种去中心化的多智能体分层强化学习框架
TAG: A Decentralized Framework for Multi-Agent Hierarchical
Reinforcement Learning
Giuseppe Paolo, Abdelhakim Benechehab, Hamza Cherkaoui, Albert Thomas, Balázs Kégl•Feb 21, 2025•82
跨朝代时序推理与对齐能力基准测试Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties
跨朝代时序推理与对齐能力基准测试
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties
Zhenglin Wang, Jialong Wu, Pengfei LI, Yong Jiang, Deyu Zhou•Feb 24, 2025•74
归纳基准:大语言模型在最简单复杂度类别中的失败InductionBench: LLMs Fail in the Simplest Complexity Class
归纳基准:大语言模型在最简单复杂度类别中的失败
InductionBench: LLMs Fail in the Simplest Complexity Class
Wenyue Hua, Tyler Wong, Sun Fei, Liangming Pan, Adam Jardine, William Yang Wang•Feb 20, 2025•62
探究量化方法对大型语言模型安全性与可靠性的影响Investigating the Impact of Quantization Methods on the Safety and
Reliability of Large Language Models
探究量化方法对大型语言模型安全性与可靠性的影响
Investigating the Impact of Quantization Methods on the Safety and
Reliability of Large Language Models
Artyom Kharinaev, Viktor Moskvoretskii, Egor Shvetsov, Kseniia Studenikina, Bykov Mikhail, Evgeny Burnaev•Feb 18, 2025•62
Pandora3D:一个面向高质量三维形状与纹理生成的综合框架Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and
Texture Generation
Pandora3D:一个面向高质量三维形状与纹理生成的综合框架
Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and
Texture Generation
Jiayu Yang, Taizhang Shang, Weixuan Sun, Xibin Song, Ziang Cheng, Senbo Wang, Shenzhou Chen, Weizhe Liu, Hongdong Li, Pan Ji•Feb 20, 2025•52
社区笔记能否取代专业事实核查员?Can Community Notes Replace Professional Fact-Checkers?
社区笔记能否取代专业事实核查员?
Can Community Notes Replace Professional Fact-Checkers?
Nadav Borenstein, Greta Warren, Desmond Elliott, Isabelle Augenstein•Feb 19, 2025•52
MutaGReP:基于代码库的无执行计划搜索MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use
MutaGReP:基于代码库的无执行计划搜索
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use
Zaid Khan, Ali Farhadi, Ranjay Krishna, Luca Weihs, Mohit Bansal, Tanmay Gupta•Feb 21, 2025•42
警惕差距!大型音频模型的静态与交互式评估Mind the Gap! Static and Interactive Evaluations of Large Audio Models
警惕差距!大型音频模型的静态与交互式评估
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Minzhi Li, William Barr Held, Michael J Ryan, Kunat Pipatanakul, Potsawee Manakul, Hao Zhu, Diyi Yang•Feb 21, 2025•32
早期退出与即时置信度翻译质量评估Early-Exit and Instant Confidence Translation Quality Estimation
早期退出与即时置信度翻译质量评估
Early-Exit and Instant Confidence Translation Quality Estimation
Vilém Zouhar, Maike Züfle, Beni Egressy, Julius Cheng, Jan Niehues•Feb 20, 2025•32
自学习长上下文理解智能体Self-Taught Agentic Long Context Understanding
自学习长上下文理解智能体
Self-Taught Agentic Long Context Understanding
Yufan Zhuang, Xiaodong Yu, Jialian Wu, Ximeng Sun, Ze Wang, Jiang Liu, Yusheng Su, Jingbo Shang, Zicheng Liu, Emad Barsoum•Feb 21, 2025•22
MONSTER:莫纳什可扩展时间序列评估库MONSTER: Monash Scalable Time Series Evaluation Repository
MONSTER:莫纳什可扩展时间序列评估库
MONSTER: Monash Scalable Time Series Evaluation Repository
Angus Dempster, Navid Mohammadi Foumani, Chang Wei Tan, Lynn Miller, Amish Mishra, Mahsa Salehi, Charlotte Pelletier, Daniel F. Schmidt, Geoffrey I. Webb•Feb 21, 2025•22
MegaLoc:一检索定全局MegaLoc: One Retrieval to Place Them All
MegaLoc:一检索定全局
MegaLoc: One Retrieval to Place Them All
Gabriele Berton, Carlo Masone•Feb 24, 2025•12
基于ViT与CNN架构的胸部X光图像COVID-19重症程度诊断Diagnosing COVID-19 Severity from Chest X-Ray Images Using ViT and CNN
Architectures
基于ViT与CNN架构的胸部X光图像COVID-19重症程度诊断
Diagnosing COVID-19 Severity from Chest X-Ray Images Using ViT and CNN
Architectures
Luis Lara, Lucia Eve Berger, Rajesh Raju, Shawn Whitfield•Feb 23, 2025•12
M3-AGIQA:多模态、多轮次、多维度的人工智能生成图像质量评估M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image
Quality Assessment
M3-AGIQA:多模态、多轮次、多维度的人工智能生成图像质量评估
M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image
Quality Assessment
Chuan Cui, Kejiang Chen, Zhihua Wei, Wen Shen, Weiming Zhang, Nenghai Yu•Feb 21, 2025•12
布朗球体中的蛇The snake in the Brownian sphere
布朗球体中的蛇
The snake in the Brownian sphere
Omer Angel, Emmanuel Jacob, Brett Kolesnik, Grégory Miermont•Feb 18, 2025•12