AI研究論文每日精選

每日精選AI研究論文及翻譯

風格友好的 SNR 取樣器用於風格驅動生成
Style-Friendly SNR Sampler for Style-Driven Generation

Jooyoung Choi, Chaehun Shin, Yeongtak Oh, Heeseung Kim, Sungroh Yoon•Nov 22, 2024•353

TÜLU 3：拓展開放式語言模型後訓練的前沿
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V. Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, Hannaneh Hajishirzi•Nov 22, 2024•632

OminiControl：擴散變壓器的最小和通用控制
OminiControl: Minimal and Universal Control for Diffusion Transformer

Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang•Nov 22, 2024•6010

一種靈活的大型語言模型護欄開發方法論，應用於離題提示偵測
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

Gabriel Chua, Shing Yee Chan, Shaun Khoo•Nov 20, 2024•232

我的時間機器：個性化面部年齡轉換
MyTimeMachine: Personalized Facial Age Transformation

Luchao Qi, Jiaye Wu, Bang Gong, Annie N. Wang, David W. Jacobs, Roni Sengupta•Nov 21, 2024•222

BALROG：在遊戲中對代理式LLM和VLM推理進行基準測試
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Davide Paglieri, Bartłomiej Cupiał, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pignatelli, Łukasz Kuciński, Lerrel Pinto, Rob Fergus, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel•Nov 20, 2024•182

大型多模型模型可以解釋大型多模型模型中的特徵
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Kaichen Zhang, Yifei Shen, Bo Li, Ziwei Liu•Nov 22, 2024•174

VideoEspresso：一個大規模的思維鏈條數據集，用於通過核心幀選擇進行細粒度視頻推理
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue Liao, Si Liu•Nov 22, 2024•133

透過基於協調的補丁重建實現高效的長視頻標記化
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction

Huiwon Jang, Sihyun Yu, Jinwoo Shin, Pieter Abbeel, Younggyo Seo•Nov 22, 2024•112

利用視頻擴散先驗進行新視角外推
Novel View Extrapolation with Video Diffusion Priors

Kunhao Liu, Ling Shao, Shijian Lu•Nov 21, 2024•103

VideoRepair：通過錯位評估和局部細化改進文本到視頻生成
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal•Nov 22, 2024•93

AI研究論文每日精選

風格友好的 SNR 取樣器用於風格驅動生成
Style-Friendly SNR Sampler for Style-Driven Generation

TÜLU 3：拓展開放式語言模型後訓練的前沿
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

OminiControl：擴散變壓器的最小和通用控制
OminiControl: Minimal and Universal Control for Diffusion Transformer

一種靈活的大型語言模型護欄開發方法論，應用於離題提示偵測
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

我的時間機器：個性化面部年齡轉換
MyTimeMachine: Personalized Facial Age Transformation

BALROG：在遊戲中對代理式LLM和VLM推理進行基準測試
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

大型多模型模型可以解釋大型多模型模型中的特徵
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

VideoEspresso：一個大規模的思維鏈條數據集，用於通過核心幀選擇進行細粒度視頻推理
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

透過基於協調的補丁重建實現高效的長視頻標記化
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction

利用視頻擴散先驗進行新視角外推
Novel View Extrapolation with Video Diffusion Priors

VideoRepair：通過錯位評估和局部細化改進文本到視頻生成
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

WildLMa：野外環境中的長時間視覺-操作整合
WildLMa: Long Horizon Loco-Manipulation in the Wild

適應視覺基礎模型以實現遙感影像中的強健雲分割
Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images

統御一切：自然語言以統合溝通、感知和行動。
One to rule them all: natural language to bind communication, perception and action

Support

AI研究論文每日精選

風格友好的 SNR 取樣器用於風格驅動生成
Style-Friendly SNR Sampler for Style-Driven Generation

TÜLU 3：拓展開放式語言模型後訓練的前沿
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

OminiControl：擴散變壓器的最小和通用控制
OminiControl: Minimal and Universal Control for Diffusion Transformer

一種靈活的大型語言模型護欄開發方法論，應用於離題提示偵測
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

我的時間機器：個性化面部年齡轉換
MyTimeMachine: Personalized Facial Age Transformation

BALROG：在遊戲中對代理式LLM和VLM推理進行基準測試
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

大型多模型模型可以解釋大型多模型模型中的特徵
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

VideoEspresso：一個大規模的思維鏈條數據集，用於通過核心幀選擇進行細粒度視頻推理
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

透過基於協調的補丁重建實現高效的長視頻標記化
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction

利用視頻擴散先驗進行新視角外推
Novel View Extrapolation with Video Diffusion Priors

VideoRepair：通過錯位評估和局部細化改進文本到視頻生成
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

WildLMa：野外環境中的長時間視覺-操作整合
WildLMa: Long Horizon Loco-Manipulation in the Wild

適應視覺基礎模型以實現遙感影像中的強健雲分割
Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images

統御一切：自然語言以統合溝通、感知和行動。
One to rule them all: natural language to bind communication, perception and action