ChatPaper.ai
打開菜單
首頁
每日論文
定價
賬戶
工作台
🇭🇰
繁體中文
Loading...
•
•
•
•
•
•
•
•
•
•
AI研究論文每日精選
每日精選AI研究論文及翻譯
January 8th, 2025
REINFORCE++:一種簡單高效的方法,用於對齊大型語言模型。
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Jian Hu
•
Jan 4, 2025
•
82
2
宇宙世界基金會物理人工智慧模型平台
Cosmos World Foundation Model Platform for Physical AI
NVIDIA, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Jingyi Jin, Seung Wook Kim, Gergely Klár, Grace Lam, Shiyi Lan, Laura Leal-Taixe, Anqi Li, Zhaoshuo Li, Chen-Hsuan Lin, Tsung-Yi Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Arsalan Mousavian, Seungjun Nah, Sriharsha Niverty, David Page, Despoina Paschalidou, Zeeshan Patel, Lindsey Pavao, Morteza Ramezanali, Fitsum Reda, Xiaowei Ren, Vasanth Rao Naik Sabavat, Ed Schmerling, Stella Shi, Bartosz Stefaniak, Shitao Tang, Lyne Tchapmi, Przemek Tredak, Wei-Cheng Tseng, Jibin Varghese, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Xinyue Wei, Jay Zhangjie Wu, Jiashu Xu, Wei Yang, Lin Yen-Chen, Xiaohui Zeng, Yu Zeng, Jing Zhang, Qinsheng Zhang, Yuxuan Zhang, Qingqing Zhao, Artur Zolkowski
•
Jan 7, 2025
•
63
2
LLaVA-Mini:具有單一視覺標記的高效圖像和視頻大型多模型。
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang, Qingkai Fang, Zhe Yang, Yang Feng
•
Jan 7, 2025
•
48
4
Sa2VA:將SAM2與LLaVA結合,實現對影像和視頻的密集基於實例的理解
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan, Xiangtai Li, Tao Zhang, Zilong Huang, Shilin Xu, Shunping Ji, Yunhai Tong, Lu Qi, Jiashi Feng, Ming-Hsuan Yang
•
Jan 7, 2025
•
40
2
MotionBench:用於視覺語言模型的細粒度視頻運動理解基準測試和改進
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong, Yean Cheng, Zhuoyi Yang, Weihan Wang, Lefan Wang, Xiaotao Gu, Shiyu Huang, Yuxiao Dong, Jie Tang
•
Jan 6, 2025
•
40
2
擴散作為著色器:具 3D 意識的影片擴散用於多功能影片生成控制
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
Zekai Gu, Rui Yan, Jiahao Lu, Peng Li, Zhiyang Dou, Chenyang Si, Zhen Dong, Qifeng Liu, Cheng Lin, Ziwei Liu, Wenping Wang, Yuan Liu
•
Jan 7, 2025
•
22
2
PPTAgent:生成並評估超越文本到幻燈片的演示文稿
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides
Hao Zheng, Xinyan Guan, Hao Kong, Jia Zheng, Hongyu Lin, Yaojie Lu, Ben He, Xianpei Han, Le Sun
•
Jan 7, 2025
•
18
3
OpenOmni:大型語言模型在跨語言上實現零-shot 全模態對齊,並具有即時自我感知情感語音合成。
OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis
Run Luo, Ting-En Lin, Haonan Zhang, Yuchuan Wu, Xiong Liu, Min Yang, Yongbin Li, Longze Chen, Jiaming Li, Lei Zhang, Yangyi Chen, Hamid Alinejad-Rokny, Fei Huang
•
Jan 8, 2025
•
16
3
海豚:透過思考、實踐和反饋的封閉迴路開放式自主研究。
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
Jiakang Yuan, Xiangchao Yan, Botian Shi, Tao Chen, Wanli Ouyang, Bo Zhang, Lei Bai, Yu Qiao, Bowen Zhou
•
Jan 7, 2025
•
14
3
魔鏡:在視頻擴散變壓器中保留ID的視頻生成
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers
Yuechen Zhang, Yaoyang Liu, Bin Xia, Bohao Peng, Zexin Yan, Eric Lo, Jiaya Jia
•
Jan 7, 2025
•
14
2
對文本進行分段並學習其獎勵,以改善語言模型中的強化學習和高效能。
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin, Shentao Yang, Yujia Xie, Ziyi Yang, Yuting Sun, Hany Awadalla, Weizhu Chen, Mingyuan Zhou
•
Jan 6, 2025
•
9
2
MoDec-GS:全域到局部運動分解與時間間隔調整,用於緊湊動態3D高斯濺射。
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting
Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh, Munchurl Kim
•
Jan 7, 2025
•
9
2
圖感知同構注意力:Transformer 中適應性動態的機制
Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers
Markus J. Buehler
•
Jan 4, 2025
•
8
2
MagicFace:使用動作單元控制進行高保真面部表情編輯
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control
Mengting Wei, Tuomas Varanka, Xingxun Jiang, Huai-Qian Khor, Guoying Zhao
•
Jan 4, 2025
•
5
2
文本引導的圖像對圖像擴散模型的通用性起源識別
Generalizable Origin Identification for Text-Guided Image-to-Image Diffusion Models
Wenhao Wang, Yifan Sun, Zongxin Yang, Zhentao Tan, Zhengdong Hu, Yi Yang
•
Jan 4, 2025
•
3
2