ChatPaper.ai
打开菜单
首页
每日论文
工作台
定价
账户
🇨🇳
中文简体
Loading...
•
•
•
•
•
•
•
•
•
•
AI研究论文每日精选
每日精选AI研究论文及翻译
November 21st, 2024
SymDPO:通过符号演示直接偏好优化,增强大型多模态模型的上下文学习
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
Hongrui Jia, Chaoya Jiang, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang
•
Nov 17, 2024
•
11
3
SageAttention2 技术报告:用于即插即用推理加速的准确 4 位注意力机制
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang, Haofeng Huang, Pengle Zhang, Jia Wei, Jun Zhu, Jianfei Chen
•
Nov 17, 2024
•
41
6
VBench++:视频生成模型的全面多功能基准套件
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu
•
Nov 20, 2024
•
24
3
VideoAutoArena:一个通过用户模拟评估视频分析中大型多模态模型的自动化竞技场
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
Ziyang Luo, Haoning Wu, Dongxu Li, Jing Ma, Mohan Kankanhalli, Junnan Li
•
Nov 20, 2024
•
15
4
SAMURAI:将“Segment Anything Model”调整为零样本视觉跟踪模型,带有运动感知记忆
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang
•
Nov 18, 2024
•
12
3
当精度遇上位置:BFloat16在长上下文训练中突破RoPE
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Haonan Wang, Qian Liu, Chao Du, Tongyao Zhu, Cunxiao Du, Kenji Kawaguchi, Tianyu Pang
•
Nov 20, 2024
•
11
2
您的LLM是否暗中成为互联网的世界模型?基于模型的规划用于网络代理程序
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Yu Gu, Boyuan Zheng, Boyu Gou, Kai Zhang, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu Su
•
Nov 10, 2024
•
10
2
风格编码:为图像生成编码风格信息
Stylecodes: Encoding Stylistic Information For Image Generation
Ciara Rowles
•
Nov 19, 2024
•
7
2
ViBe:一个用于评估大型多模态模型中幻觉的文本到视频基准测试。
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
Vipula Rawte, Sarthak Jain, Aarush Sinha, Garv Kaushik, Aman Bansal, Prathiksha Rumale Vishwanath, Samyak Rajesh Jain, Aishwarya Naresh Reganti, Vinija Jain, Aman Chadha, Amit P. Sheth, Amitava Das
•
Nov 16, 2024
•
6
3
损失预测:所有数据集的比例定律
Loss-to-Loss Prediction: Scaling Laws for All Datasets
David Brandfonbrener, Nikhil Anand, Nikhil Vyas, Eran Malach, Sham Kakade
•
Nov 19, 2024
•
5
2
通过文本到图像的RGBA实例生成生成组合场景
Generating Compositional Scenes via Text-to-image RGBA Instance Generation
Alessandro Fontanella, Petru-Daniel Tudosiu, Yongxin Yang, Shifeng Zhang, Sarah Parisot
•
Nov 16, 2024
•
2
2
ORID:面向器官区域信息的放射学报告生成框架
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai
•
Nov 20, 2024
•
2
2