ChatPaper.ai
打开菜单
首页
每日论文
arXiv
HuggingFace
定价
账户
工作台
🇨🇳
中文简体
Loading...
•
•
•
•
•
•
•
•
•
•
AI研究论文每日精选
每日精选AI研究论文及翻译
March 28th, 2025
Video-R1:强化多模态大语言模型中的视频推理能力
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng, Kaixiong Gong, Bohao Li, Zonghao Guo, Yibing Wang, Tianshuo Peng, Benyou Wang, Xiangyu Yue
•
Mar 27, 2025
•
78
6
大型语言模型代理:方法论、应用与挑战综述
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, Rongcheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, Dacheng Tao, Philip S. Yu, Ming Zhang
•
Mar 27, 2025
•
77
2
UI-R1:通过强化学习提升GUI代理的行为预测能力
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning
Zhengxi Lu, Yuxiang Chai, Yaxuan Guo, Xi Yin, Liang Liu, Hao Wang, Guanjing Xiong, Hongsheng Li
•
Mar 27, 2025
•
61
9
挑战推理边界:面向大语言模型的奥林匹克级数学基准
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models
Haoxiang Sun, Yingqian Min, Zhipeng Chen, Wayne Xin Zhao, Zheng Liu, Zhongyuan Wang, Lei Fang, Ji-Rong Wen
•
Mar 27, 2025
•
37
4
VBench-2.0:推进视频生成基准套件,提升内在真实性评估
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Dian Zheng, Ziqi Huang, Hongbo Liu, Kai Zou, Yinan He, Fan Zhang, Yuanhan Zhang, Jingwen He, Wei-Shi Zheng, Yu Qiao, Ziwei Liu
•
Mar 27, 2025
•
33
2
ReaRAG:知识引导的推理通过迭代检索增强生成提升大型推理模型的真实性
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
Zhicheng Lee, Shulin Cao, Jinxin Liu, Jiajie Zhang, Weichuan Liu, Xiaoyin Che, Lei Hou, Juanzi Li
•
Mar 27, 2025
•
28
4
LeX-Art:通过可扩展的高质量数据合成重新思考文本生成
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
Shitian Zhao, Qilong Wu, Xinyue Li, Bo Zhang, Ming Li, Qi Qin, Dongyang Liu, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Peng Gao, Bin Fu, Zhen Li
•
Mar 27, 2025
•
26
2
ChatAnyone:基于分层运动扩散模型的实时风格化人像视频生成
ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model
Jinwei Qi, Chaonan Ji, Sheng Xu, Peng Zhang, Bang Zhang, Liefeng Bo
•
Mar 27, 2025
•
25
3
具身推理者:融合视觉搜索、推理与行动,实现具身交互任务
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Wenqi Zhang, Mengna Wang, Gangao Liu, Xu Huixin, Yiwei Jiang, Yongliang Shen, Guiyang Hou, Zhe Zheng, Hang Zhang, Xin Li, Weiming Lu, Peng Li, Yueting Zhuang
•
Mar 27, 2025
•
22
3
Lumina-Image 2.0:统一高效的图像生成框架
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Jiakang Yuan, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Manyuan Zhang, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Conghui He, Bo Zhang, Xiaohong Liu, Hongsheng Li, Yu Qiao, Chang Xu, Peng Gao
•
Mar 27, 2025
•
21
2
研究基准:通过基于启发的任务分解评估大语言模型在科学发现中的表现
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition
Yujie Liu, Zonglin Yang, Tong Xie, Jinjie Ni, Ben Gao, Yuqiang Li, Shixiang Tang, Wanli Ouyang, Erik Cambria, Dongzhan Zhou
•
Mar 27, 2025
•
20
2
FinAudio:面向金融应用的音频大语言模型基准
FinAudio: A Benchmark for Audio Large Language Models in Financial Applications
Yupeng Cao, Haohang Li, Yangyang Yu, Shashidhar Reddy Javaji, Yueru He, Jimin Huang, Zining Zhu, Qianqian Xie, Xiao-yang Liu, Koduvayur Subbalakshmi, Meikang Qiu, Sophia Ananiadou, Jian-Yun Nie
•
Mar 26, 2025
•
19
2
合成视频提升了视频合成中的物理真实感
Synthetic Video Enhances Physical Fidelity in Video Synthesis
Qi Zhao, Xingyu Ni, Ziyu Wang, Feng Cheng, Ziyan Yang, Lu Jiang, Bohan Wang
•
Mar 26, 2025
•
16
3
扩散采样的最优步长
Optimal Stepsize for Diffusion Sampling
Jianning Pei, Han Hu, Shuyang Gu
•
Mar 27, 2025
•
13
2
探索视频生成中的物理认知演进:一项综述
Exploring the Evolution of Physics Cognition in Video Generation: A Survey
Minghui Lin, Xiang Wang, Yishan Wang, Shu Wang, Fengqi Dai, Pengxiang Ding, Cunxiang Wang, Zhengrong Zuo, Nong Sang, Siteng Huang, Donglin Wang
•
Mar 27, 2025
•
11
2
统一多模态离散扩散
Unified Multimodal Discrete Diffusion
Alexander Swerdlow, Mihir Prabhudesai, Siddharth Gandhi, Deepak Pathak, Katerina Fragkiadaki
•
Mar 26, 2025
•
9
2
Feature4X:通过多功能高斯特征场将任意单目视频桥接至4D智能体AI
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou, Hui Ren, Yijia Weng, Shuwang Zhang, Zhen Wang, Dejia Xu, Zhiwen Fan, Suya You, Zhangyang Wang, Leonidas Guibas, Achuta Kadambi
•
Mar 26, 2025
•
8
2
语义库自适应:面向开放词汇语义分割的LoRA检索与融合
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos, Marc Botet Colomer, Linus Härenstam-Nielsen, Mattia Segu, Pier Luigi Dovesi, Jussi Karlgren, Daniel Cremers, Federico Tombari, Matteo Poggi
•
Mar 27, 2025
•
7
2
ZJUKLAB在SemEval-2025任务4中的探索:通过模型融合实现遗忘学习
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging
Haoming Xu, Shuxun Wang, Yanqiu Zhao, Yi Zhong, Ziyan Jiang, Ningyuan Zhao, Shumin Deng, Huajun Chen, Ningyu Zhang
•
Mar 27, 2025
•
7
2
LLPut:探索基于缺陷报告的大语言模型输入生成
LLPut: Investigating Large Language Models for Bug Report-Based Input Generation
Alif Al Hasan, Subarna Saha, Mia Mohammad Imran, Tarannum Shaila Zaman
•
Mar 26, 2025
•
5
2
Tracktention:利用点追踪技术实现更快更优的视频注意力机制
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Zihang Lai, Andrea Vedaldi
•
Mar 25, 2025
•
2
2
LOCATEdit:基于图拉普拉斯优化的跨注意力机制,实现精准文本引导的图像编辑
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Achint Soni, Meet Soni, Sirisha Rambhatla
•
Mar 27, 2025
•
1
2