Daily Papers
One-Step Residual Shifting Diffusion for Image Super-Resolution via
Distillation
Daniil Selikhanovych, David Li, Aleksei Leonov, Nikita Gushchin, Sergei Kushneriuk, Alexander Filippov, Evgeny Burnaev, Iaroslav Koshelev, Alexander Korotin•Mar 17, 2025•822
Survey on Evaluation of LLM-based Agents
Asaf Yehudai, Lilach Eden, Alan Li, Guy Uziel, Yilun Zhao, Roy Bar-Haim, Arman Cohan, Michal Shmueli-Scheuer•Mar 20, 2025•582
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen, Zhong, Hanjie Chen, Xia Hu•Mar 20, 2025•522
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qinxiang Lin, Jinwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu Yue•Mar 20, 2025•353
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
NVIDIA, Alisson Azzolini, Hannah Brandon, Prithvijit Chattopadhyay, Huayu Chen, Jinju Chu, Yin Cui, Jenna Diamond, Yifan Ding, Francesco Ferroni, Rama Govindaraju, Jinwei Gu, Siddharth Gururani, Imad El Hanafi, Zekun Hao, Jacob Huffman, Jingyi Jin, Brendan Johnson, Rizwan Khan, George Kurian, Elena Lantz, Nayeon Lee, Zhaoshuo Li, Xuan Li, Tsung-Yi Lin, Yen-Chen Lin, Ming-Yu Liu, Andrew Mathau, Yun Ni, Lindsey Pavao, Wei Ping, David W. Romero, Misha Smelyanskiy, Shuran Song, Lyne Tchapmi, Andrew Z. Wang, Boxin Wang, Haoxiang Wang, Fangyin Wei, Jiashu Xu, Yao Xu, Xiaodong Yang, Zhuolin Yang, Xiaohui Zeng, Zhe Zhang•Mar 18, 2025•292
Scale-wise Distillation of Diffusion Models
Nikita Starodubcev, Denis Kuznedelev, Artem Babenko, Dmitry Baranchuk•Mar 20, 2025•284
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play
Visual Games with Keyboards and Mouse
Muyao Li, Zihao Wang, Kaichen He, Xiaojian Ma, Yitao Liang•Mar 20, 2025•252
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Quy-Anh Dang, Chris Ngo•Mar 20, 2025•252
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
Minglei Shi, Ziyang Yuan, Haotian Yang, Xintao Wang, Mingwu Zheng, Xin Tao, Wenliang Zhao, Wenzhao Zheng, Jie Zhou, Jiwen Lu, Pengfei Wan, Di Zhang, Kun Gai•Mar 18, 2025•255
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Hao Kang, Xin Lu•Mar 20, 2025•245
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri, Melissa Z. Pan, Shuyi Yang, Lakshya A. Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica•Mar 17, 2025•242
Inside-Out: Hidden Factual Knowledge in LLMs
Zorik Gekhman, Eyal Ben David, Hadas Orgad, Eran Ofek, Yonatan Belinkov, Idan Szpector, Jonathan Herzig, Roi Reichart•Mar 19, 2025•221
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language
Models
Keda Tao, Haoxuan You, Yang Sui, Can Qin, Huan Wang•Mar 20, 2025•203
Fin-R1: A Large Language Model for Financial Reasoning through
Reinforcement Learning
Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai, Ziwei Yang, Xueqian Zhao, Chao Li, Sheng Xu, Dezhi Chen, Yun Chen, Zuo Bai, Liwen Zhang•Mar 20, 2025•204
MathFusion: Enhancing Mathematic Problem-solving of LLM through
Instruction Fusion
Qizhi Pei, Lijun Wu, Zhuoshi Pan, Yu Li, Honglin Lin, Chenlin Ming, Xin Gao, Conghui He, Rui Yan•Mar 20, 2025•172
LHM: Large Animatable Human Reconstruction Model from a Single Image in
Seconds
Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, Guanying Chen, Zilong Dong, Liefeng Bo•Mar 13, 2025•172
SynCity: Training-Free Generation of 3D Worlds
Paul Engstler, Aleksandar Shtedritski, Iro Laina, Christian Rupprecht, Andrea Vedaldi•Mar 20, 2025•163
M3: 3D-Spatial MultiModal Memory
Xueyan Zou, Yuchen Song, Ri-Zhao Qiu, Xuanbin Peng, Jianglong Ye, Sifei Liu, Xiaolong Wang•Mar 20, 2025•132
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Yuheng Yuan, Qiuhong Shen, Xingyi Yang, Xinchao Wang•Mar 20, 2025•112
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
Yunzhi Yao, Jizhan Fang, Jia-Chen Gu, Ningyu Zhang, Shumin Deng, Huajun Chen, Nanyun Peng•Mar 20, 2025•112
Ultra-Resolution Adaptation with Ease
Ruonan Yu, Songhua Liu, Zhenxiong Tan, Xinchao Wang•Mar 20, 2025•112
Expert Race: A Flexible Routing Strategy for Scaling Diffusion
Transformer with Mixture of Experts
Yike Yuan, Ziyu Wang, Zihao Huang, Defa Zhu, Xun Zhou, Jingyi Yu, Qiyang Min•Mar 20, 2025•112
XAttention: Block Sparse Attention with Antidiagonal Scoring
Ruyi Xu, Guangxuan Xiao, Haofeng Huang, Junxian Guo, Song Han•Mar 20, 2025•102
Tokenize Image as a Set
Zigang Geng, Mengde Xu, Han Hu, Shuyang Gu•Mar 20, 2025•103
MagicMotion: Controllable Video Generation with Dense-to-Sparse
Trajectory Guidance
Quanhao Li, Zhen Xing, Rui Wang, Hui Zhang, Qi Dai, Zuxuan Wu•Mar 20, 2025•82
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes
Han-Hung Lee, Qinghong Han, Angel X. Chang•Mar 20, 2025•82
SALT: Singular Value Adaptation with Low-Rank Transformation
Abdelrahman Elsayed, Sarim Hashmi, Mohammed Elseiagy, Hu Wang, Mohammad Yaqub, Ibrahim Almakky•Mar 20, 2025•82
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video
Diffusion
Zhou Zhenglin, Ma Fan, Fan Hehe, Chua Tat-Seng•Mar 20, 2025•82
BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space
Complexity?
Pierre Chambon, Baptiste Roziere, Benoit Sagot, Gabriel Synnaeve•Mar 19, 2025•72
Agents Play Thousands of 3D Video Games
Zhongwen Xu, Xianliang Wang, Siyi Li, Tao Yu, Liang Wang, Qiang Fu, Wei Yang•Mar 17, 2025•72
CLS-RL: Image Classification with Rule-Based Reinforcement Learning
Ming Li, Shitian Zhao, Jike Zhong, Yuxiang Lai, Kaipeng Zhang•Mar 20, 2025•62
Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling
Yanchen Luo, Zhiyuan Liu, Yi Zhao, Sihang Li, Kenji Kawaguchi, Tat-Seng Chua, Xiang Wang•Mar 19, 2025•62
Sonata: Self-Supervised Learning of Reliable Point Representations
Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard Newcombe, Hengshuang Zhao, Julian Straub•Mar 20, 2025•52
Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on
Compressed Spatial Tokens
Shuqi Lu, Haowei Lin, Lin Yao, Zhifeng Gao, Xiaohong Ji, Weinan E, Linfeng Zhang, Guolin Ke•Mar 20, 2025•52
MotionStreamer: Streaming Motion Generation via Diffusion-based
Autoregressive Model in Causal Latent Space
Lixing Xiao, Shunlin Lu, Huaijin Pi, Ke Fan, Liang Pan, Yueer Zhou, Ziyong Feng, Xiaowei Zhou, Sida Peng, Jingbo Wang•Mar 19, 2025•52
Make Your Training Flexible: Towards Deployment-Efficient Video Models
Chenting Wang, Kunchang Li, Tianxiang Jiang, Xiangyu Zeng, Yi Wang, Limin Wang•Mar 18, 2025•52
See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language
Balance to Mitigate Dominant Modality Bias
JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, Juhwan Choi, YoungBin Kim•Mar 18, 2025•42
MagicID: Hybrid Preference Optimization for ID-Consistent and
Dynamic-Preserved Video Customization
Hengjia Li, Lifan Jiang, Xi Xiao, Tianyang Wang, Hongwei Yi, Boxi Wu, Deng Cai•Mar 16, 2025•42
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
Yuanxin Liu, Rui Zhu, Shuhuai Ren, Jiacong Wang, Haoyuan Guo, Xu Sun, Lu Jiang•Mar 13, 2025•42
Improving Autoregressive Image Generation through Coarse-to-Fine Token
Prediction
Ziyao Guo, Kaipeng Zhang, Michael Qizhe Shieh•Mar 20, 2025•32
Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging
Fabricated Claims with Humorous Content
Sai Kartheek Reddy Kasu, Shankar Biradar, Sunil Saumya•Mar 20, 2025•32
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting
Generation with Flexible Pose and Multi-View Joint Modeling
Hyojun Go, Byeongjun Park, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim•Mar 20, 2025•32
Where do Large Vision-Language Models Look at when Answering Questions?
Xiaoying Xing, Chia-Wen Kuo, Li Fuxin, Yulei Niu, Fan Chen, Ming Li, Ying Wu, Longyin Wen, Sijie Zhu•Mar 18, 2025•22
Painting with Words: Elevating Detailed Image Captioning with Benchmark
and Alignment Learning
Qinghao Ye, Xianhan Zeng, Fu Li, Chunyuan Li, Haoqi Fan•Mar 10, 2025•22
AIMI: Leveraging Future Knowledge and Personalization in Sparse Event
Forecasting for Treatment Adherence
Abdullah Mamun, Diane J. Cook, Hassan Ghasemzadeh•Mar 20, 2025•12
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi, Eddy Ilg, Margret Keuper, Hideki Tanaka, Masao Utiyama, Raj Dabre, Steffen Eger, Simone Paolo Ponzetto•Mar 14, 2025•12
GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for
Autonomous Driving
William Ljungbergh, Adam Lilja, Adam Tonderski. Arvid Laveno Ling, Carl Lindström, Willem Verbeke, Junsheng Fu, Christoffer Petersson, Lars Hammarstrand, Michael Felsberg•Mar 19, 2025•02
Why Personalizing Deep Learning-Based Code Completion Tools Matters
Alessandro Giagnorio, Alberto Martin-Lopez, Gabriele Bavota•Mar 18, 2025•02