ChatPaper.ai
Open Menu
Home
Daily Papers
Dashboard
Pricing
Account
🇬🇧
English
Loading...
•
•
•
•
•
•
•
•
•
•
AI Research Papers Daily
Daily curated AI research papers with translations
December 3rd, 2024
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
Zeyi Sun, Ziyang Chu, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
•
Dec 2, 2024
•
35
1
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang
•
Nov 27, 2024
•
16
1
Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin, Yunyang Ge, Xinhua Cheng, Zongjian Li, Bin Zhu, Shaodong Wang, Xianyi He, Yang Ye, Shenghai Yuan, Liuhan Chen, Tanghui Jia, Junwu Zhang, Zhenyu Tang, Yatian Pang, Bin She, Cen Yan, Zhiheng Hu, Xiaoyi Dong, Lin Chen, Zhang Pan, Xing Zhou, Shaoling Dong, Yonghong Tian, Li Yuan
•
Nov 28, 2024
•
16
1
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov, Denis Kuznedelev, Mikhail Khoroshikh, Valentin Khrulkov, Dmitry Baranchuk
•
Dec 2, 2024
•
15
1
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
Jinyuan Qu, Hongyang Li, Shilong Liu, Tianhe Ren, Zhaoyang Zeng, Lei Zhang
•
Nov 27, 2024
•
13
1
o1-Coder: an o1 Replication for Coding
Yuxiang Zhang, Shangxi Wu, Yuqi Yang, Jiangming Shu, Jinlin Xiao, Chao Kong, Jitao Sang
•
Nov 29, 2024
•
12
1
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Jianping Jiang, Weiye Xiao, Zhengyu Lin, Huaizhong Zhang, Tianxiang Ren, Yang Gao, Zhiqian Lin, Zhongang Cai, Lei Yang, Ziwei Liu
•
Nov 29, 2024
•
10
1
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min, Cong Wei, Wenhu Chen
•
Dec 1, 2024
•
10
1
TinyFusion: Diffusion Transformers Learned Shallow
Gongfan Fang, Kunjun Li, Xinyin Ma, Xinchao Wang
•
Dec 2, 2024
•
9
1
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang, Yong Man Ro, Yueh-Hua Wu
•
Dec 2, 2024
•
8
1
Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
Maitreya Patel, Song Wen, Dimitris N. Metaxas, Yezhou Yang
•
Nov 27, 2024
•
8
4
Efficient Track Anything
Yunyang Xiong, Chong Zhou, Xiaoyu Xiang, Lemeng Wu, Chenchen Zhu, Zechun Liu, Saksham Suri, Balakrishnan Varadarajan, Ramya Akula, Forrest Iandola, Raghuraman Krishnamoorthi, Bilge Soran, Vikas Chandra
•
Nov 28, 2024
•
7
2
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Taekyung Ki, Dongchan Min, Gyoungsu Chae
•
Dec 2, 2024
•
6
1
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Xin Yan, Yuxuan Cai, Qiuyue Wang, Yuan Zhou, Wenhao Huang, Huan Yang
•
Dec 2, 2024
•
6
1
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Zongjian Li, Bin Lin, Yang Ye, Liuhan Chen, Xinhua Cheng, Shenghai Yuan, Li Yuan
•
Nov 26, 2024
•
6
1
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Xuhao Hu, Dongrui Liu, Hao Li, Xuanjing Huang, Jing Shao
•
Nov 29, 2024
•
4
1
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
Ryo Kamoi, Yusen Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Rui Zhang
•
Dec 1, 2024
•
4
1
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Meng Cao, Haoran Tang, Haoze Zhao, Hangyu Guo, Jiaheng Liu, Ge Zhang, Ruyang Liu, Qiang Sun, Ian Reid, Xiaodan Liang
•
Dec 2, 2024
•
3
1
Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input
Francesco Taioli, Edoardo Zorzi, Gianni Franchi, Alberto Castellini, Alessandro Farinelli, Marco Cristani, Yiming Wang
•
Dec 2, 2024
•
3
1
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Angelika Romanou, Negar Foroutan, Anna Sotnikova, Zeming Chen, Sree Harsha Nelaturu, Shivalika Singh, Rishabh Maheshwary, Micol Altomare, Mohamed A. Haggag, Snegha A, Alfonso Amayuelas, Azril Hafizi Amirudin, Viraat Aryabumi, Danylo Boiko, Michael Chang, Jenny Chim, Gal Cohen, Aditya Kumar Dalmia, Abraham Diress, Sharad Duwal, Daniil Dzenhaliou, Daniel Fernando Erazo Florez, Fabian Farestam, Joseph Marvin Imperial, Shayekh Bin Islam, Perttu Isotalo, Maral Jabbarishiviari, Börje F. Karlsson, Eldar Khalilov, Christopher Klamm, Fajri Koto, Dominik Krzemiński, Gabriel Adriano de Melo, Syrielle Montariol, Yiyang Nan, Joel Niklaus, Jekaterina Novikova, Johan Samir Obando Ceron, Debjit Paul, Esther Ploeger, Jebish Purbey, Swati Rajwal, Selvan Sunitha Ravi, Sara Rydell, Roshan Santhosh, Drishti Sharma, Marjana Prifti Skenduli, Arshia Soltani Moakhar, Bardia Soltani Moakhar, Ran Tamir, Ayush Kumar Tarun, Azmine Toushik Wasi, Thenuka Ovin Weerasinghe, Serhan Yilmaz, Mike Zhang, Imanol Schlag, Marzieh Fadaee, Sara Hooker, Antoine Bosselut
•
Nov 29, 2024
•
3
1
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge
Hui Ren, Joanna Materzynska, Rohit Gandikota, David Bau, Antonio Torralba
•
Nov 29, 2024
•
2
2
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models
Yanxi Chen, Xuchen Pan, Yaliang Li, Bolin Ding, Jingren Zhou
•
Nov 29, 2024
•
1
1
World-consistent Video Diffusion with Explicit 3D Modeling
Qihang Zhang, Shuangfei Zhai, Miguel Angel Bautista, Kevin Miao, Alexander Toshev, Joshua Susskind, Jiatao Gu
•
Dec 2, 2024
•
0
1
Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning
Aditya Narayan Sankaran, Reza Farahbaksh, Noel Crespi
•
Dec 2, 2024
•
0
1