AI研究論文每日精選

每日精選AI研究論文及翻譯

你的LLM能夠穩定推理嗎？
Are Your LLMs Capable of Stable Reasoning?

Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, Kai Chen•Dec 17, 2024•953

OmniEval：金融領域中的全方位自動RAG評估基準
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Shuting Wang, Jiejun Tan, Zhicheng Dou, Ji-Rong Wen•Dec 17, 2024•422

多維度洞察：在大型多模型中對真實世界個性化進行基準測試
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models

YiFan Zhang, Shanglin Lei, Runqi Qiao, Zhuoma GongQue, Xiaoshuai Song, Guanting Dong, Qiuna Tan, Zhe Wei, Peiqing Yang, Ye Tian, Yadong Xue, Xiaofei Wang, Honggang Zhang•Dec 17, 2024•423

壓縮思維鏈：透過密集表示進行高效推理
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

Jeffrey Cheng, Benjamin Van Durme•Dec 17, 2024•362

抽象概念的出現：Transformer 中的情境學習概念編碼與解碼機制
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

Seungwook Han, Jinyeop Song, Jeff Gore, Pulkit Agrawal•Dec 16, 2024•152

VisDoM：使用多模檢索增強生成的視覺豐富元素進行多文檔問答
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Manan Suri, Puneet Mathur, Franck Dernoncourt, Kanika Goswami, Ryan A. Rossi, Dinesh Manocha•Dec 14, 2024•152

調節節流閥：重新審視用於加速視覺語言模型的視覺標記修剪
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Mark Endo, Xiaohan Wang, Serena Yeung-Levy•Dec 17, 2024•132

提議者-代理者-評估者（PAE）：基於模型互聯網代理的自主技能發現
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Yifei Zhou, Qianlan Yang, Kaixiang Lin, Min Bai, Xiong Zhou, Yu-Xiong Wang, Sergey Levine, Erran Li•Dec 17, 2024•122

AI研究論文每日精選

你的LLM能夠穩定推理嗎？
Are Your LLMs Capable of Stable Reasoning?

OmniEval：金融領域中的全方位自動RAG評估基準
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

多維度洞察：在大型多模型中對真實世界個性化進行基準測試
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models

壓縮思維鏈：透過密集表示進行高效推理
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

抽象概念的出現：Transformer 中的情境學習概念編碼與解碼機制
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

VisDoM：使用多模檢索增強生成的視覺豐富元素進行多文檔問答
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

調節節流閥：重新審視用於加速視覺語言模型的視覺標記修剪
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

提議者-代理者-評估者（PAE）：基於模型互聯網代理的自主技能發現
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Marigold-DC：具有引導擴散的零樣本單目深度完成。
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

尋找者：朝向具有中介語言代理程式框架的例外安全碼生成
Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework

SUGAR：以主題驅動的零樣本方式進行視頻定制
SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

MIVE：多實例影片編輯的新設計與基準。
MIVE: New Design and Benchmark for Multi-Instance Video Editing

何時發言，何時棄權：具對比性解碼與棄權
When to Speak, When to Abstain: Contrastive Decoding with Abstention

Support

AI研究論文每日精選

你的LLM能夠穩定推理嗎？
Are Your LLMs Capable of Stable Reasoning?

OmniEval：金融領域中的全方位自動RAG評估基準
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

多維度洞察：在大型多模型中對真實世界個性化進行基準測試
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models

壓縮思維鏈：透過密集表示進行高效推理
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

抽象概念的出現：Transformer 中的情境學習概念編碼與解碼機制
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

VisDoM：使用多模檢索增強生成的視覺豐富元素進行多文檔問答
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

調節節流閥：重新審視用於加速視覺語言模型的視覺標記修剪
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

提議者-代理者-評估者（PAE）：基於模型互聯網代理的自主技能發現
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Marigold-DC：具有引導擴散的零樣本單目深度完成。
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

尋找者：朝向具有中介語言代理程式框架的例外安全碼生成
Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework

SUGAR：以主題驅動的零樣本方式進行視頻定制
SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

MIVE：多實例影片編輯的新設計與基準。
MIVE: New Design and Benchmark for Multi-Instance Video Editing

何時發言，何時棄權：具對比性解碼與棄權
When to Speak, When to Abstain: Contrastive Decoding with Abstention