ChatPaper.ai
打開菜單
首頁
每日論文
arXiv
HuggingFace
定價
賬戶
工作台
🇭🇰
繁體中文
Loading...
•
•
•
•
•
•
•
•
•
•
AI研究論文每日精選
每日精選AI研究論文及翻譯
October 4th, 2024
重新審視大規模圖像標題數據在預訓練多模態基礎模型中的應用
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Juan Lao Tebar, Wenze Hu, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang
•
Oct 3, 2024
•
55
2
SageAttention:準確的8位元注意力機制用於即插即用推論加速。
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang, Jia wei, Pengle Zhang, Jun Zhu, Jianfei Chen
•
Oct 3, 2024
•
50
5
深度 Pro:不到一秒鐘的銳利單眼度量深度
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, Vladlen Koltun
•
Oct 2, 2024
•
42
2
使用合成數據進行視訊指導調校
Video Instruction Tuning With Synthetic Data
Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li
•
Oct 3, 2024
•
39
3
Loong:使用自回歸語言模型生成長達分鐘級的視頻
Loong: Generating Minute-level Long Videos with Autoregressive Language Models
Yuqing Wang, Tianwei Xiong, Daquan Zhou, Zhijie Lin, Yang Zhao, Bingyi Kang, Jiashi Feng, Xihui Liu
•
Oct 3, 2024
•
38
3
對比式區域化語言-圖像預訓練
Contrastive Localized Language-Image Pre-Training
Hong-You Chen, Zhengfeng Lai, Haotian Zhang, Xinze Wang, Marcin Eichner, Keen You, Meng Cao, Bowen Zhang, Yinfei Yang, Zhe Gan
•
Oct 3, 2024
•
38
3
LLaVA-Critic:學習評估多模型
LLaVA-Critic: Learning to Evaluate Multimodal Models
Tianyi Xiong, Xiyao Wang, Dong Guo, Qinghao Ye, Haoqi Fan, Quanquan Gu, Heng Huang, Chunyuan Li
•
Oct 3, 2024
•
36
3
大型語言模型作為馬可夫鏈
Large Language Models as Markov Chains
Oussama Zekri, Ambroise Odonnat, Abdelhakim Benechehab, Linus Bleistein, Nicolas Boullé, Ievgen Redko
•
Oct 3, 2024
•
33
3
消除擴散模型中高引導比例的過飽和和人工痕跡
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Seyedmorteza Sadat, Otmar Hilliges, Romann M. Weber
•
Oct 3, 2024
•
31
4
VinePPO:透過精細化的信用分配釋放 LLM 推理的強化學習潛力
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance, Alessandro Sordoni, Siva Reddy, Aaron Courville, Nicolas Le Roux
•
Oct 2, 2024
•
25
2
在沒有指導訓練的情況下提煉端對端語音助手 資料
Distilling an End-to-End Voice Assistant Without Instruction Training Data
William Held, Ella Li, Michael Ryan, Weiyan Shi, Yanzhe Zhang, Diyi Yang
•
Oct 3, 2024
•
23
5
上下文文件嵌入
Contextual Document Embeddings
John X. Morris, Alexander M. Rush
•
Oct 3, 2024
•
23
4
CLIP-MoE:朝向建立具有多樣化多重循環利用的 CLIP 專家混合模型
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
Jihai Zhang, Xiaoye Qu, Tong Zhu, Yu Cheng
•
Sep 28, 2024
•
20
2
在合成編輯序列上訓練語言模型可改善程式碼合成。
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Ulyana Piterbarg, Lerrel Pinto, Rob Fergus
•
Oct 3, 2024
•
12
3
L-CiteEval:長文本模型是否真正善用上下文來回應?
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?
Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang
•
Oct 3, 2024
•
10
3
Open-RAG:使用開源大型語言模型進行增強檢索增強推理
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models
Shayekh Bin Islam, Md Asib Rahman, K S M Tozammel Hossain, Enamul Hoque, Shafiq Joty, Md Rizwan Parvez
•
Oct 2, 2024
•
10
3
解讀和編輯視覺語言表示以減輕幻覺
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Nick Jiang, Anish Kachinthaya, Suzie Petryk, Yossi Gandelsman
•
Oct 3, 2024
•
9
2
MedVisionLlama:利用預訓練的大型語言模型層來增強醫學影像分割
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation
Gurucharan Marthi Krishna Kumar, Aman Chadha, Janine Mendola, Amir Shmuel
•
Oct 3, 2024
•
9
5
通過反思樹搜索和自我學習來提升自主AI代理程序
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
Xiao Yu, Baolin Peng, Vineeth Vajipey, Hao Cheng, Michel Galley, Jianfeng Gao, Zhou Yu
•
Oct 2, 2024
•
9
2
MVGS:多視角調節高斯塗抹用於新視角合成
MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis
Xiaobiao Du, Yida Wang, Xin Yu
•
Oct 2, 2024
•
8
3
Vinoground:通過短視頻對密集時間推理中的LMMs進行審查
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
Jianrui Zhang, Mu Cai, Yong Jae Lee
•
Oct 3, 2024
•
7
2
混沌邊緣的智能
Intelligence at the Edge of Chaos
Shiyang Zhang, Aakash Patel, Syed A Rizvi, Nianchen Liu, Sizhuang He, Amin Karbasi, Emanuele Zappala, David van Dijk
•
Oct 3, 2024
•
6
2
Synthio:使用合成數據增強小規模音頻分類數據集
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh, Sonal Kumar, Zhifeng Kong, Rafael Valle, Bryan Catanzaro, Dinesh Manocha
•
Oct 2, 2024
•
6
2
從數據中學習遊戲的潛在規則:一個象棋故事
Learning the Latent Rules of a Game from Data: A Chess Story
Ben Fauber
•
Oct 3, 2024
•
5
2
在大型語言模型中進行零-shot 跨語言轉移的層交換
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj, Rui Hou, Nayan Singhal, Hongjiang Lv, Bing Liu
•
Oct 2, 2024
•
5
3
Robin3D:通過強健指導調整改進3D大型語言模型
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang, Haifeng Huang, Yuzhang Shang, Mubarak Shah, Yan Yan
•
Sep 30, 2024
•
5
2
科普羅姆普特:知識增強提示用於科學主題的細粒度分類。
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics
Zhiwen You, Kanyao Han, Haotian Zhu, Bertram Ludäscher, Jana Diesner
•
Oct 2, 2024
•
4
3