AI研究论文每日精选

每日精选AI研究论文及翻译

Phi-4-Mini技术报告:通过混合LoRA实现紧凑而强大的多模态语言模型
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, Dong Chen, Dongdong Chen, Junkun Chen, Weizhu Chen, Yen-Chun Chen, Yi-ling Chen, Qi Dai, Xiyang Dai, Ruchao Fan, Mei Gao, Min Gao, Amit Garg, Abhishek Goswami, Junheng Hao, Amr Hendy, Yuxuan Hu, Xin Jin, Mahmoud Khademi, Dongwoo Kim, Young Jin Kim, Gina Lee, Jinyu Li, Yunsheng Li, Chen Liang, Xihui Lin, Zeqi Lin, Mengchen Liu, Yang Liu, Gilsinia Lopez, Chong Luo, Piyush Madan, Vadim Mazalov, Ali Mousavi, Anh Nguyen, Jing Pan, Daniel Perez-Becker, Jacob Platin, Thomas Portet, Kai Qiu, Bo Ren, Liliang Ren, Sambuddha Roy, Ning Shang, Yelong Shen, Saksham Singhal, Subhojit Som, Xia Song, Tetyana Sych, Praneetha Vaddamanu, Shuohang Wang, Yiming Wang, Zhenghao Wang, Haibin Wu, Haoran Xu, Weijian Xu, Yifan Yang, Ziyi Yang, Donghan Yu, Ishmam Zabir, Jianwen Zhang, Li Lyna Zhang, Yunan Zhang, Xiren ZhouMar 3, 2025736

视觉强化微调(Visual-RFT)
Visual-RFT: Visual Reinforcement Fine-Tuning

Ziyu Liu, Zeyi Sun, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Haodong Duan, Dahua Lin, Jiaqi WangMar 3, 2025662

Difix3D+:利用单步扩散模型提升三维重建效果
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models

Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, Huan LingMar 3, 2025392

通过自校准实现高效的测试时扩展
Efficient Test-Time Scaling via Self-Calibration

Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin HuangFeb 25, 2025142

推测性即席查询
Speculative Ad-hoc Querying

Haoyu Li, Srikanth Kandula, Maria Angels de Luis Balaguer, Aditya Akella, Venkat ArunMar 2, 2025122

Kiss3DGen:重新利用图像扩散模型进行3D资产生成
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

Jiantao Lin, Xin Yang, Meixi Chen, Yingjie Xu, Dongyu Yan, Leyi Wu, Xinli Xu, Lie XU, Shunsi Zhang, Ying-Cong ChenMar 3, 2025112

Qilin:一个包含应用级用户会话的多模态信息检索数据集
Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions

Jia Chen, Qian Dong, Haitao Li, Xiaohui He, Yan Gao, Shaosheng Cao, Yi Wu, Ping Yang, Chen Xu, Yao Hu, Qingyao Ai, Yiqun LiuMar 1, 2025112

大规模数据筛选用于指令微调
Large-Scale Data Selection for Instruction Tuning

Hamish Ivison, Muru Zhang, Faeze Brahman, Pang Wei Koh, Pradeep DasigiMar 3, 2025102

SampleMix:一种通过协调数据质量与多样性实现的样本级预训练数据混合策略
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity

Xiangyu Xi, Deyang Kong, Jian Yang, Jiawei Yang, Zhengyu Chen, Wei Wang, Jingang Wang, Xunliang Cai, Shikun Zhang, Wei YeMar 3, 202592

CodeArena:大型语言模型代码生成的集体评估平台
CodeArena: A Collective Evaluation Platform for LLM Code Generation

Mingzhe Du, Anh Tuan Luu, Bin Ji, Xiaobao Wu, Dong Huang, Terry Yue Zhuo, Qian Liu, See-Kiong NgMar 3, 202582

PodAgent:一个全面的播客生成框架
PodAgent: A Comprehensive Framework for Podcast Generation

Yujia Xiao, Lei He, Haohan Guo, Fenglong Xie, Tan LeeMar 1, 202562

通用推理能力需从初始阶段就学习如何推理
General Reasoning Requires Learning to Reason from the Get-go

Seungwook Han, Jyothish Pari, Samuel J. Gershman, Pulkit AgrawalFeb 26, 202542

向自回归多模态基础模型传授度量距离
Teaching Metric Distance to Autoregressive Multimodal Foundational Models

Jiwan Chung, Saejin Kim, Yongrae Jo, Jaewoo Park, Dongjun Min, Youngjae YuMar 4, 202532

CLEA:闭环具身智能体,用于增强动态环境中的任务执行能力
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

Mingcong Lei, Ge Wang, Yiming Zhao, Zhixin Mai, Qing Zhao, Yao Guo, Zhen Li, Shuguang Cui, Yatong Han, Jinke RenMar 2, 202532