ChatPaper.ai
打開菜單
首頁
每日論文
定價
賬戶
工作台
🇭🇰
繁體中文
Loading...
•
•
•
•
•
•
•
•
•
•
AI研究論文每日精選
每日精選AI研究論文及翻譯
December 16th, 2024
阿波羅:大型多模型中視頻理解的探索
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Orr Zohar, Xiaohan Wang, Yann Dubois, Nikhil Mehta, Tong Xiao, Philippe Hansen-Estruch, Licheng Yu, Xiaofang Wang, Felix Juefei-Xu, Ning Zhang, Serena Yeung-Levy, Xide Xia
•
Dec 13, 2024
•
139
12
GenEx:生成可探索的世界
GenEx: Generating an Explorable World
Taiming Lu, Tianmin Shu, Junfei Xiao, Luoxin Ye, Jiahao Wang, Cheng Peng, Chen Wei, Daniel Khashabi, Rama Chellappa, Alan Yuille, Jieneng Chen
•
Dec 12, 2024
•
88
2
SynerGen-VL:朝向具有視覺專家和標記折疊的協同圖像理解和生成
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Hao Li, Changyao Tian, Jie Shao, Xizhou Zhu, Zhaokai Wang, Jinguo Zhu, Wenhan Dou, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai
•
Dec 12, 2024
•
35
4
大型動作模型:從構想到實踐
Large Action Models: From Inception to Implementation
Lu Wang, Fangkai Yang, Chaoyun Zhang, Junting Lu, Jiaxu Qian, Shilin He, Pu Zhao, Bo Qiao, Ray Huang, Si Qin, Qisheng Su, Jiayi Ye, Yudi Zhang, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
•
Dec 13, 2024
•
32
5
BiMediX2:針對多元醫學模式的生物醫學專家深度學習模型
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities
Sahal Shaji Mullappilly, Mohammed Irfan Kurpath, Sara Pieri, Saeed Yahya Alseiari, Shanavas Cholakkal, Khaled Aldahmani, Fahad Khan, Rao Anwer, Salman Khan, Timothy Baldwin, Hisham Cholakkal
•
Dec 10, 2024
•
26
2
FreeScale:透過無需調整的尺度融合,釋放擴散模型的解析能力
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Haonan Qiu, Shiwei Zhang, Yujie Wei, Ruihang Chu, Hangjie Yuan, Xiang Wang, Yingya Zhang, Ziwei Liu
•
Dec 12, 2024
•
20
2
基於殘差向量量化的高效生成建模與標記
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens
Jaehyeon Kim, Taehong Moon, Keon Lee, Jaewoong Cho
•
Dec 13, 2024
•
19
2
InstanceCap:通過實例感知結構化標題來改進文本到視頻生成
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Zhenheng Yang, Chaoyou Fu, Xiang Li, Jian Yang, Ying Tai
•
Dec 12, 2024
•
19
3
ObjectMate:物件插入和以主題驅動的生成的循環先驗
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation
Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, Yedid Hoshen
•
Dec 11, 2024
•
11
2
FireFlow:用於圖像語義編輯的快速矯正流反演
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
Yingying Deng, Xiangyu He, Changwang Mei, Peisong Wang, Fan Tang
•
Dec 10, 2024
•
11
3
LinGen:朝向高解析度、分鐘級長度的文本到視頻生成,具有線性計算複雜度。
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu, Ji Hou, Tao Xu, Jialiang Wang, Felix Juefei-Xu, Yaqiao Luo, Peizhao Zhang, Tingbo Hou, Peter Vajda, Niraj K. Jha, Xiaoliang Dai
•
Dec 13, 2024
•
10
4
FluxSpace:在矯正流變壓縮器中的解耦語義編輯
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag
•
Dec 12, 2024
•
9
2
SCBench:一項以鍵值快取為中心的長內容方法分析
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
Yucheng Li, Huiqiang Jiang, Qianhui Wu, Xufang Luo, Surin Ahn, Chengruidong Zhang, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili Qiu
•
Dec 13, 2024
•
9
2
具有明確橋樑和檢索增強的多模式音樂生成
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
Baisen Wang, Le Zhuo, Zhaokai Wang, Chenxi Bao, Wu Chengjing, Xuecheng Nie, Jiao Dai, Jizhong Han, Yue Liao, Si Liu
•
Dec 12, 2024
•
7
4
GReaTer:推理梯度使較小的語言模型更強大 提示優化器
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
Sarkar Snigdha Sarathi Das, Ryo Kamoi, Bo Pang, Yusen Zhang, Caiming Xiong, Rui Zhang
•
Dec 12, 2024
•
5
3
SmolTulu:較高的學習率與批次大小比率可能導致在SLM中更好的推理能力。
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs
Sultan Alrashed
•
Dec 11, 2024
•
4
2
TraceVLA:視覺追蹤提示增強廣義機器人策略的時空意識
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang, Jianfeng Gao, Hal Daumé III, Andrey Kolobov, Furong Huang, Jianwei Yang
•
Dec 13, 2024
•
2
2
Prompt2Perturb(P2P):基於擴散的文本引導對乳腺超聲波圖像的對抗攻擊
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images
Yasamin Medghalchi, Moein Heidari, Clayton Allard, Leonid Sigal, Ilker Hacihaliloglu
•
Dec 13, 2024
•
1
2