AI研究论文每日精选

每日精选AI研究论文及翻译

ROCKET-1：利用视觉-时间上下文掌握开放世界互动提示
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Shaofei Cai, Zihao Wang, Kewei Lian, Zhancun Mu, Xiaojian Ma, Anji Liu, Yitao Liang•Oct 23, 2024•526

使用基于每个标记的潜在扩散的连续语音合成
Continuous Speech Synthesis using per-token Latent Diffusion

Arnon Turetzky, Nimrod Shabtay, Slava Shechtman, Hagai Aronowitz, David Haws, Ron Hoory, Avihu Dekel•Oct 21, 2024•303

教授多模态LLMs理解心电图像
Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Ruoqi Liu, Yuelin Bai, Xiang Yue, Ping Zhang•Oct 21, 2024•242

FasterCache：无需训练的视频扩散模型加速与高质量
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong•Oct 25, 2024•232

MMAU：一个大规模多任务音频理解和推理基准。
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

S Sakshi, Utkarsh Tyagi, Sonal Kumar, Ashish Seth, Ramaneswaran Selvakumar, Oriol Nieto, Ramani Duraiswami, Sreyan Ghosh, Dinesh Manocha•Oct 24, 2024•202

Infinity-MM：通过大规模和高质量的指导数据扩展多模态性能
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Shuhao Gu, Jialing Zhang, Siyuan Zhou, Kevin Yu, Zhaohu Xing, Liangdong Wang, Zhou Cao, Jintao Jia, Zhuoyi Zhang, Yixuan Wang, Zhenchong Hu, Bo-Wen Zhang, Jijie Li, Dong Liang, Yingli Zhao, Yulong Ao, Yaoqi Liu, Fangxiang Feng, Guang Liu•Oct 24, 2024•202

阅读：将LLMs重构为与系统共同设计的路由器解耦专家混合模型
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang•Oct 24, 2024•152

LLM优于报告吗？检测标签错误并减轻其对模型性能的影响
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance

Omer Nahum, Nitay Calderon, Orgad Keller, Idan Szpektor, Roi Reichart•Oct 24, 2024•152

媒体景观映射：通过网络互动预测事实报道和政治偏见
Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions

Dairazalia Sánchez-Cortés, Sergio Burdisso, Esaú Villatoro-Tello, Petr Motlicek•Oct 23, 2024•52

利用未标记的先前数据中的技能进行高效的在线探索
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Max Wilcoxson, Qiyang Li, Kevin Frans, Sergey Levine•Oct 23, 2024•42

AI研究论文每日精选

ROCKET-1：利用视觉-时间上下文掌握开放世界互动提示
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

使用基于每个标记的潜在扩散的连续语音合成
Continuous Speech Synthesis using per-token Latent Diffusion

教授多模态LLMs理解心电图像
Teach Multimodal LLMs to Comprehend Electrocardiographic Images

FasterCache：无需训练的视频扩散模型加速与高质量
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

MMAU：一个大规模多任务音频理解和推理基准。
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Infinity-MM：通过大规模和高质量的指导数据扩展多模态性能
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

阅读：将LLMs重构为与系统共同设计的路由器解耦专家混合模型
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

LLM优于报告吗？检测标签错误并减轻其对模型性能的影响
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance

大型语言模型的计数能力及标记化的影响
Counting Ability of Large Language Models and Impact of Tokenization

混合偏好：学习为人类与人工智能反馈路由实例
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

通过先决条件学习，虚构的合成数据可以提高LLM事实性。
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

在知识冲突下分析语言模型的残余流。
Analysing the Residual Stream of Language Models Under Knowledge Conflicts

基于图神经动力学建模的动态3D高斯跟踪
Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling

反思基准：利用反思来探究人工智能智能化
Reflection-Bench: probing AI intelligence with reflection

媒体景观映射：通过网络互动预测事实报道和政治偏见
Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions

利用未标记的先前数据中的技能进行高效的在线探索
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Support

AI研究论文每日精选

ROCKET-1：利用视觉-时间上下文掌握开放世界互动提示
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

使用基于每个标记的潜在扩散的连续语音合成
Continuous Speech Synthesis using per-token Latent Diffusion

教授多模态LLMs理解心电图像
Teach Multimodal LLMs to Comprehend Electrocardiographic Images

FasterCache：无需训练的视频扩散模型加速与高质量
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

MMAU：一个大规模多任务音频理解和推理基准。
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Infinity-MM：通过大规模和高质量的指导数据扩展多模态性能
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

阅读：将LLMs重构为与系统共同设计的路由器解耦专家混合模型
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

LLM优于报告吗？检测标签错误并减轻其对模型性能的影响
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance

大型语言模型的计数能力及标记化的影响
Counting Ability of Large Language Models and Impact of Tokenization

混合偏好：学习为人类与人工智能反馈路由实例
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

通过先决条件学习，虚构的合成数据可以提高LLM事实性。
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

在知识冲突下分析语言模型的残余流。
Analysing the Residual Stream of Language Models Under Knowledge Conflicts

基于图神经动力学建模的动态3D高斯跟踪
Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling

反思基准：利用反思来探究人工智能智能化
Reflection-Bench: probing AI intelligence with reflection

媒体景观映射：通过网络互动预测事实报道和政治偏见
Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions

利用未标记的先前数据中的技能进行高效的在线探索
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration