AI研究论文每日精选

每日精选AI研究论文及翻译

BitNet b1.58 2B4T 技术报告
BitNet b1.58 2B4T Technical Report

Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Hu, Ting Song, Yan Xia, Furu Wei•Apr 16, 2025•662

ReTool：大语言模型中策略性工具使用的强化学习
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Yujia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, Wanjun Zhong•Apr 15, 2025•582

ColorBench：视觉语言模型能否感知并理解多彩世界？一项关于色彩感知、推理与鲁棒性的综合基准测试
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

Yijun Liang, Ming Li, Chenrui Fan, Ziyue Li, Dang Nguyen, Kwesi Cobbina, Shweta Bhardwaj, Jiuhai Chen, Fuxiao Liu, Tianyi Zhou•Apr 10, 2025•454

Cobra：基于广泛参考的高效线稿上色
Cobra: Efficient Line Art COlorization with BRoAder References

Junhao Zhuang, Lingen Li, Xuan Ju, Zhaoyang Zhang, Chun Yuan, Ying Shan•Apr 16, 2025•272

SFT还是RL？关于训练R1类推理大型视觉语言模型的早期探索
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Hardy Chen, Haoqin Tu, Fali Wang, Hui Liu, Xianfeng Tang, Xinya Du, Yuyin Zhou, Cihang Xie•Apr 10, 2025•262

AlayaDB：高效长上下文LLM推理的数据基石
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference

Yangshen Deng, Zhengxin You, Long Xiang, Qilong Li, Peiqi Yuan, Zhaoyang Hong, Yitao Zheng, Wanting Li, Runzhong Li, Haotian Liu, Kyriakos Mouratidis, Man Lung Yiu, Huan Li, Qiaomu Shen, Rui Mao, Bo Tang•Apr 14, 2025•252

REPA-E：开启VAE端到端调优，结合潜在扩散与Transformer
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, Liang Zheng•Apr 14, 2025•202

MLRC-Bench：语言智能体能否攻克机器学习研究难题？
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?

Yunxiang Zhang, Muhammad Khalifa, Shitanshu Bhushan, Grant D Murphy, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang•Apr 13, 2025•172

SIFT-50M：面向语音指令微调的大规模多语言数据集
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning

Prabhat Pandey, Rupak Vignesh Swaminathan, K V Vijay Girish, Arunasish Sen, Jian Xie, Grant P. Strimel, Andreas Schwarz•Apr 12, 2025•162

稳健且细粒度的AI生成文本检测
Robust and Fine-Grained Detection of AI Generated Texts

Ram Mohan Rao Kadiyala, Siddartha Pullakhandam, Kanwal Mehreen, Drishti Sharma, Siddhant Gupta, Jebish Purbey, Ashay Srivastava, Subhasya TippaReddy, Arvind Reddy Bobbili, Suraj Telugara Chandrashekhar, Modabbir Adeeb, Srinadh Vura, Hamza Farooq•Apr 16, 2025•112

BlockGaussian：基于自适应分块高斯溅射的高效大规模场景新视角合成
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting

Yongchang Wu, Zipeng Qi, Zhenwei Shi, Zhengxia Zou•Apr 12, 2025•72

FreshStack：构建用于评估技术文档检索的现实基准
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Nandan Thakur, Jimmy Lin, Sam Havens, Michael Carbin, Omar Khattab, Andrew Drozdov•Apr 17, 2025•53

“这并非我的真实写照”：探究合成AI语音服务中的口音偏见与数字排斥现象
"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services

Shira Michel, Sufi Kaur, Sarah Elizabeth Gillespie, Jeffrey Gleason, Christo Wilson, Avijit Ghosh•Apr 12, 2025•42

AI研究论文每日精选

BitNet b1.58 2B4T 技术报告
BitNet b1.58 2B4T Technical Report

ReTool：大语言模型中策略性工具使用的强化学习
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

ColorBench：视觉语言模型能否感知并理解多彩世界？一项关于色彩感知、推理与鲁棒性的综合基准测试
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

Cobra：基于广泛参考的高效线稿上色
Cobra: Efficient Line Art COlorization with BRoAder References

SFT还是RL？关于训练R1类推理大型视觉语言模型的早期探索
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

AlayaDB：高效长上下文LLM推理的数据基石
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference

REPA-E：开启VAE端到端调优，结合潜在扩散与Transformer
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

MLRC-Bench：语言智能体能否攻克机器学习研究难题？
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?

SIFT-50M：面向语音指令微调的大规模多语言数据集
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning

稳健且细粒度的AI生成文本检测
Robust and Fine-Grained Detection of AI Generated Texts

迈向学习如何完成激光雷达中的全方位感知
Towards Learning to Complete Anything in Lidar

Vivid4D：通过视频修复技术提升单目视频的4D重建效果
Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting

思维协同：利用最小自由分解提升大语言模型的思维链推理
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution

BlockGaussian：基于自适应分块高斯溅射的高效大规模场景新视角合成
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting

FreshStack：构建用于评估技术文档检索的现实基准
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

“这并非我的真实写照”：探究合成AI语音服务中的口音偏见与数字排斥现象
"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services

Support

AI研究论文每日精选

BitNet b1.58 2B4T 技术报告
BitNet b1.58 2B4T Technical Report

ReTool：大语言模型中策略性工具使用的强化学习
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

ColorBench：视觉语言模型能否感知并理解多彩世界？一项关于色彩感知、推理与鲁棒性的综合基准测试
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

Cobra：基于广泛参考的高效线稿上色
Cobra: Efficient Line Art COlorization with BRoAder References

SFT还是RL？关于训练R1类推理大型视觉语言模型的早期探索
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

AlayaDB：高效长上下文LLM推理的数据基石
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference

REPA-E：开启VAE端到端调优，结合潜在扩散与Transformer
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

MLRC-Bench：语言智能体能否攻克机器学习研究难题？
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?

SIFT-50M：面向语音指令微调的大规模多语言数据集
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning

稳健且细粒度的AI生成文本检测
Robust and Fine-Grained Detection of AI Generated Texts

迈向学习如何完成激光雷达中的全方位感知
Towards Learning to Complete Anything in Lidar

Vivid4D：通过视频修复技术提升单目视频的4D重建效果
Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting

思维协同：利用最小自由分解提升大语言模型的思维链推理
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution

BlockGaussian：基于自适应分块高斯溅射的高效大规模场景新视角合成
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting

FreshStack：构建用于评估技术文档检索的现实基准
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

“这并非我的真实写照”：探究合成AI语音服务中的口音偏见与数字排斥现象
"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services