AI研究论文每日精选
每日精选AI研究论文及翻译
Phi-4-Mini技术报告:通过混合LoRA实现紧凑而强大的多模态语言模型Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language
Models via Mixture-of-LoRAs
Phi-4-Mini技术报告:通过混合LoRA实现紧凑而强大的多模态语言模型
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language
Models via Mixture-of-LoRAs
Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, Dong Chen, Dongdong Chen, Junkun Chen, Weizhu Chen, Yen-Chun Chen, Yi-ling Chen, Qi Dai, Xiyang Dai, Ruchao Fan, Mei Gao, Min Gao, Amit Garg, Abhishek Goswami, Junheng Hao, Amr Hendy, Yuxuan Hu, Xin Jin, Mahmoud Khademi, Dongwoo Kim, Young Jin Kim, Gina Lee, Jinyu Li, Yunsheng Li, Chen Liang, Xihui Lin, Zeqi Lin, Mengchen Liu, Yang Liu, Gilsinia Lopez, Chong Luo, Piyush Madan, Vadim Mazalov, Ali Mousavi, Anh Nguyen, Jing Pan, Daniel Perez-Becker, Jacob Platin, Thomas Portet, Kai Qiu, Bo Ren, Liliang Ren, Sambuddha Roy, Ning Shang, Yelong Shen, Saksham Singhal, Subhojit Som, Xia Song, Tetyana Sych, Praneetha Vaddamanu, Shuohang Wang, Yiming Wang, Zhenghao Wang, Haibin Wu, Haoran Xu, Weijian Xu, Yifan Yang, Ziyi Yang, Donghan Yu, Ishmam Zabir, Jianwen Zhang, Li Lyna Zhang, Yunan Zhang, Xiren Zhou•Mar 3, 2025•736
视觉强化微调(Visual-RFT)Visual-RFT: Visual Reinforcement Fine-Tuning
视觉强化微调(Visual-RFT)
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Haodong Duan, Dahua Lin, Jiaqi Wang•Mar 3, 2025•662
Difix3D+:利用单步扩散模型提升三维重建效果Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
Difix3D+:利用单步扩散模型提升三维重建效果
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, Huan Ling•Mar 3, 2025•392
赋能自我提升推理者的认知行为,或高效STaRs的四大习惯Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four
Habits of Highly Effective STaRs
赋能自我提升推理者的认知行为,或高效STaRs的四大习惯
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four
Habits of Highly Effective STaRs
Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, Noah D. Goodman•Mar 3, 2025•313
DiffRhythm:基于潜在扩散的极速且简洁的端到端全曲生成DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End
Full-Length Song Generation with Latent Diffusion
DiffRhythm:基于潜在扩散的极速且简洁的端到端全曲生成
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End
Full-Length Song Generation with Latent Diffusion
Ziqian Ning, Huakang Chen, Yuepeng Jiang, Chunbo Hao, Guobin Ma, Shuai Wang, Jixun Yao, Lei Xie•Mar 3, 2025•262
从数小时到数分钟:无损加速超长序列生成至10万标记From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence
Generation up to 100K Tokens
从数小时到数分钟:无损加速超长序列生成至10万标记
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence
Generation up to 100K Tokens
Tong Wu, Junzhe Shen, Zixia Jia, Yuxuan Wang, Zilong Zheng•Feb 26, 2025•242
OneRec:统一检索与排序的生成式推荐系统及迭代偏好对齐OneRec: Unifying Retrieve and Rank with Generative Recommender and
Iterative Preference Alignment
OneRec:统一检索与排序的生成式推荐系统及迭代偏好对齐
OneRec: Unifying Retrieve and Rank with Generative Recommender and
Iterative Preference Alignment
Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, Guorui Zhou•Feb 26, 2025•232
当大型语言模型对其答案感到不安时——且这种不确定性是合理的When an LLM is apprehensive about its answers -- and when its
uncertainty is justified
当大型语言模型对其答案感到不安时——且这种不确定性是合理的
When an LLM is apprehensive about its answers -- and when its
uncertainty is justified
Petr Sychev, Andrey Goncharov, Daniil Vyazhev, Edvard Khalafyan, Alexey Zaytsev•Mar 3, 2025•192
Liger:将大型语言模型线性化为门控循环结构Liger: Linearizing Large Language Models to Gated Recurrent Structures
Liger:将大型语言模型线性化为门控循环结构
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan, Weigao Sun, Jiaxi Hu, Jusen Du, Yu Cheng•Mar 3, 2025•152
通过自校准实现高效的测试时扩展Efficient Test-Time Scaling via Self-Calibration
通过自校准实现高效的测试时扩展
Efficient Test-Time Scaling via Self-Calibration
Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang•Feb 25, 2025•142
推测性即席查询Speculative Ad-hoc Querying
推测性即席查询
Speculative Ad-hoc Querying
Haoyu Li, Srikanth Kandula, Maria Angels de Luis Balaguer, Aditya Akella, Venkat Arun•Mar 2, 2025•122
Kiss3DGen:重新利用图像扩散模型进行3D资产生成Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
Kiss3DGen:重新利用图像扩散模型进行3D资产生成
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
Jiantao Lin, Xin Yang, Meixi Chen, Yingjie Xu, Dongyu Yan, Leyi Wu, Xinli Xu, Lie XU, Shunsi Zhang, Ying-Cong Chen•Mar 3, 2025•112
Qilin:一个包含应用级用户会话的多模态信息检索数据集Qilin: A Multimodal Information Retrieval Dataset with APP-level User
Sessions
Qilin:一个包含应用级用户会话的多模态信息检索数据集
Qilin: A Multimodal Information Retrieval Dataset with APP-level User
Sessions
Jia Chen, Qian Dong, Haitao Li, Xiaohui He, Yan Gao, Shaosheng Cao, Yi Wu, Ping Yang, Chen Xu, Yao Hu, Qingyao Ai, Yiqun Liu•Mar 1, 2025•112
大规模数据筛选用于指令微调Large-Scale Data Selection for Instruction Tuning
大规模数据筛选用于指令微调
Large-Scale Data Selection for Instruction Tuning
Hamish Ivison, Muru Zhang, Faeze Brahman, Pang Wei Koh, Pradeep Dasigi•Mar 3, 2025•102
双解码:基于硬件感知的异构推测解码与动态多序列草稿生成DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with
Dynamic Multi-Sequence Drafting
双解码:基于硬件感知的异构推测解码与动态多序列草稿生成
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with
Dynamic Multi-Sequence Drafting
Kai Lv, Honglin Guo, Qipeng Guo, Xipeng Qiu•Mar 2, 2025•102
SampleMix:一种通过协调数据质量与多样性实现的样本级预训练数据混合策略SampleMix: A Sample-wise Pre-training Data Mixing Strategey by
Coordinating Data Quality and Diversity
SampleMix:一种通过协调数据质量与多样性实现的样本级预训练数据混合策略
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by
Coordinating Data Quality and Diversity
Xiangyu Xi, Deyang Kong, Jian Yang, Jiawei Yang, Zhengyu Chen, Wei Wang, Jingang Wang, Xunliang Cai, Shikun Zhang, Wei Ye•Mar 3, 2025•92
VideoUFO:面向文本生成视频的百万级用户关注数据集VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video
Generation
VideoUFO:面向文本生成视频的百万级用户关注数据集
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video
Generation
Wenhao Wang, Yi Yang•Mar 3, 2025•82
CodeArena:大型语言模型代码生成的集体评估平台CodeArena: A Collective Evaluation Platform for LLM Code Generation
CodeArena:大型语言模型代码生成的集体评估平台
CodeArena: A Collective Evaluation Platform for LLM Code Generation
Mingzhe Du, Anh Tuan Luu, Bin Ji, Xiaobao Wu, Dong Huang, Terry Yue Zhuo, Qian Liu, See-Kiong Ng•Mar 3, 2025•82
PodAgent:一个全面的播客生成框架PodAgent: A Comprehensive Framework for Podcast Generation
PodAgent:一个全面的播客生成框架
PodAgent: A Comprehensive Framework for Podcast Generation
Yujia Xiao, Lei He, Haohan Guo, Fenglong Xie, Tan Lee•Mar 1, 2025•62
词形至关重要:大语言模型在乱序拼写下的语义重构Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia
词形至关重要:大语言模型在乱序拼写下的语义重构
Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia
Chenxi Wang, Tianle Gu, Zhongyu Wei, Lang Gao, Zirui Song, Xiuying Chen•Mar 3, 2025•52
通用推理能力需从初始阶段就学习如何推理General Reasoning Requires Learning to Reason from the Get-go
通用推理能力需从初始阶段就学习如何推理
General Reasoning Requires Learning to Reason from the Get-go
Seungwook Han, Jyothish Pari, Samuel J. Gershman, Pulkit Agrawal•Feb 26, 2025•42
向自回归多模态基础模型传授度量距离Teaching Metric Distance to Autoregressive Multimodal Foundational
Models
向自回归多模态基础模型传授度量距离
Teaching Metric Distance to Autoregressive Multimodal Foundational
Models
Jiwan Chung, Saejin Kim, Yongrae Jo, Jaewoo Park, Dongjun Min, Youngjae Yu•Mar 4, 2025•32
直接判别优化:你的基于似然的视觉生成模型本质上是一个GAN判别器Direct Discriminative Optimization: Your Likelihood-Based Visual
Generative Model is Secretly a GAN Discriminator
直接判别优化:你的基于似然的视觉生成模型本质上是一个GAN判别器
Direct Discriminative Optimization: Your Likelihood-Based Visual
Generative Model is Secretly a GAN Discriminator
Kaiwen Zheng, Yongxin Chen, Huayu Chen, Guande He, Ming-Yu Liu, Jun Zhu, Qinsheng Zhang•Mar 3, 2025•32
CLEA:闭环具身智能体,用于增强动态环境中的任务执行能力CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic
Environments
CLEA:闭环具身智能体,用于增强动态环境中的任务执行能力
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic
Environments
Mingcong Lei, Ge Wang, Yiming Zhao, Zhixin Mai, Qing Zhao, Yao Guo, Zhen Li, Shuguang Cui, Yatong Han, Jinke Ren•Mar 2, 2025•32
RSQ:从关键令牌中学习助力更优量化大语言模型RSQ: Learning from Important Tokens Leads to Better Quantized LLMs
RSQ:从关键令牌中学习助力更优量化大语言模型
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs
Yi-Lin Sung, Prateek Yadav, Jialu Li, Jaehong Yoon, Mohit Bansal•Mar 3, 2025•23
为何网络AI代理比独立大语言模型更易受攻击?一项安全分析Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security
Analysis
为何网络AI代理比独立大语言模型更易受攻击?一项安全分析
Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security
Analysis
Jeffrey Yang Fan Chiang, Seungjae Lee, Jia-Bin Huang, Furong Huang, Yizheng Chen•Feb 27, 2025•22
预训练模型时代下的非摆拍稀疏视角房间布局重建Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain
Model
预训练模型时代下的非摆拍稀疏视角房间布局重建
Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain
Model
Yaxuan Huang, Xili Dai, Jianan Wang, Xianbiao Qi, Yixing Yuan, Xiangyu Yue•Feb 24, 2025•22