ChatPaper.aiChatPaper

Daily Papers

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

Daniil Selikhanovych, David Li, Aleksei Leonov, Nikita Gushchin, Sergei Kushneriuk, Alexander Filippov, Evgeny Burnaev, Iaroslav Koshelev, Alexander KorotinMar 17, 2025822

Survey on Evaluation of LLM-based Agents

Asaf Yehudai, Lilach Eden, Alan Li, Guy Uziel, Yilun Zhao, Roy Bar-Haim, Arman Cohan, Michal Shmueli-ScheuerMar 20, 2025582

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen, Zhong, Hanjie Chen, Xia HuMar 20, 2025522

Unleashing Vecset Diffusion Model for Fast Shape Generation

Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qinxiang Lin, Jinwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu YueMar 20, 2025353

Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

NVIDIA, Alisson Azzolini, Hannah Brandon, Prithvijit Chattopadhyay, Huayu Chen, Jinju Chu, Yin Cui, Jenna Diamond, Yifan Ding, Francesco Ferroni, Rama Govindaraju, Jinwei Gu, Siddharth Gururani, Imad El Hanafi, Zekun Hao, Jacob Huffman, Jingyi Jin, Brendan Johnson, Rizwan Khan, George Kurian, Elena Lantz, Nayeon Lee, Zhaoshuo Li, Xuan Li, Tsung-Yi Lin, Yen-Chen Lin, Ming-Yu Liu, Andrew Mathau, Yun Ni, Lindsey Pavao, Wei Ping, David W. Romero, Misha Smelyanskiy, Shuran Song, Lyne Tchapmi, Andrew Z. Wang, Boxin Wang, Haoxiang Wang, Fangyin Wei, Jiashu Xu, Yao Xu, Xiaodong Yang, Zhuolin Yang, Xiaohui Zeng, Zhe ZhangMar 18, 2025292

Scale-wise Distillation of Diffusion Models

Nikita Starodubcev, Denis Kuznedelev, Artem Babenko, Dmitry BaranchukMar 20, 2025284

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Muyao Li, Zihao Wang, Kaichen He, Xiaojian Ma, Yitao LiangMar 20, 2025252

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

Minglei Shi, Ziyang Yuan, Haotian Yang, Xintao Wang, Mingwu Zheng, Xin Tao, Wenliang Zhao, Wenzhao Zheng, Jie Zhou, Jiwen Lu, Pengfei Wan, Di Zhang, Kun GaiMar 18, 2025255

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Hao Kang, Xin LuMar 20, 2025245

Why Do Multi-Agent LLM Systems Fail?

Mert Cemri, Melissa Z. Pan, Shuyi Yang, Lakshya A. Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, Matei Zaharia, Joseph E. Gonzalez, Ion StoicaMar 17, 2025242

Inside-Out: Hidden Factual Knowledge in LLMs

Zorik Gekhman, Eyal Ben David, Hadas Orgad, Eran Ofek, Yonatan Belinkov, Idan Szpector, Jonathan Herzig, Roi ReichartMar 19, 2025221

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

Keda Tao, Haoxuan You, Yang Sui, Can Qin, Huan WangMar 20, 2025203

Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning

Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng, Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai, Ziwei Yang, Xueqian Zhao, Chao Li, Sheng Xu, Dezhi Chen, Yun Chen, Zuo Bai, Liwen ZhangMar 20, 2025204

MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion

Qizhi Pei, Lijun Wu, Zhuoshi Pan, Yu Li, Honglin Lin, Chenlin Ming, Xin Gao, Conghui He, Rui YanMar 20, 2025172

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, Guanying Chen, Zilong Dong, Liefeng BoMar 13, 2025172

SynCity: Training-Free Generation of 3D Worlds

Paul Engstler, Aleksandar Shtedritski, Iro Laina, Christian Rupprecht, Andrea VedaldiMar 20, 2025163

M3: 3D-Spatial MultiModal Memory

Xueyan Zou, Yuchen Song, Ri-Zhao Qiu, Xuanbin Peng, Jianglong Ye, Sifei Liu, Xiaolong WangMar 20, 2025132

1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering

Yuheng Yuan, Qiuhong Shen, Xingyi Yang, Xinchao WangMar 20, 2025112

CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners

Yunzhi Yao, Jizhan Fang, Jia-Chen Gu, Ningyu Zhang, Shumin Deng, Huajun Chen, Nanyun PengMar 20, 2025112

Ultra-Resolution Adaptation with Ease

Ruonan Yu, Songhua Liu, Zhenxiong Tan, Xinchao WangMar 20, 2025112

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Yike Yuan, Ziyu Wang, Zihao Huang, Defa Zhu, Xun Zhou, Jingyi Yu, Qiyang MinMar 20, 2025112

XAttention: Block Sparse Attention with Antidiagonal Scoring

Ruyi Xu, Guangxuan Xiao, Haofeng Huang, Junxian Guo, Song HanMar 20, 2025102

Tokenize Image as a Set

Zigang Geng, Mengde Xu, Han Hu, Shuyang GuMar 20, 2025103

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

Quanhao Li, Zhen Xing, Rui Wang, Hui Zhang, Qi Dai, Zuxuan WuMar 20, 202582

NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes

Han-Hung Lee, Qinghong Han, Angel X. ChangMar 20, 202582

SALT: Singular Value Adaptation with Low-Rank Transformation

Abdelrahman Elsayed, Sarim Hashmi, Mohammed Elseiagy, Hu Wang, Mohammad Yaqub, Ibrahim AlmakkyMar 20, 202582

Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion

Zhou Zhenglin, Ma Fan, Fan Hehe, Chua Tat-SengMar 20, 202582

BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?

Pierre Chambon, Baptiste Roziere, Benoit Sagot, Gabriel SynnaeveMar 19, 202572

Agents Play Thousands of 3D Video Games

Zhongwen Xu, Xianliang Wang, Siyi Li, Tao Yu, Liang Wang, Qiang Fu, Wei YangMar 17, 202572

CLS-RL: Image Classification with Rule-Based Reinforcement Learning

Ming Li, Shitian Zhao, Jike Zhong, Yuxiang Lai, Kaipeng ZhangMar 20, 202562

Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling

Yanchen Luo, Zhiyuan Liu, Yi Zhao, Sihang Li, Kenji Kawaguchi, Tat-Seng Chua, Xiang WangMar 19, 202562

Sonata: Self-Supervised Learning of Reliable Point Representations

Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard Newcombe, Hengshuang Zhao, Julian StraubMar 20, 202552

Uni-3DAR: Unified 3D Generation and Understanding via Autoregression on Compressed Spatial Tokens

Shuqi Lu, Haowei Lin, Lin Yao, Zhifeng Gao, Xiaohong Ji, Weinan E, Linfeng Zhang, Guolin KeMar 20, 202552

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

Lixing Xiao, Shunlin Lu, Huaijin Pi, Ke Fan, Liang Pan, Yueer Zhou, Ziyong Feng, Xiaowei Zhou, Sida Peng, Jingbo WangMar 19, 202552

Make Your Training Flexible: Towards Deployment-Efficient Video Models

Chenting Wang, Kunchang Li, Tianxiang Jiang, Xiangyu Zeng, Yi Wang, Limin WangMar 18, 202552

See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias

JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, Juhwan Choi, YoungBin KimMar 18, 202542

MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization

Hengjia Li, Lifan Jiang, Xi Xiao, Tianyang Wang, Hongwei Yi, Boxi Wu, Deng CaiMar 16, 202542

UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

Yuanxin Liu, Rui Zhu, Shuhuai Ren, Jiacong Wang, Haoyuan Guo, Xu Sun, Lu JiangMar 13, 202542

Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction

Ziyao Guo, Kaipeng Zhang, Michael Qizhe ShiehMar 20, 202532

VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling

Hyojun Go, Byeongjun Park, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick KimMar 20, 202532

Where do Large Vision-Language Models Look at when Answering Questions?

Xiaoying Xing, Chia-Wen Kuo, Li Fuxin, Yulei Niu, Fan Chen, Ming Li, Ying Wu, Longyin Wen, Sijie ZhuMar 18, 202522

Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning

Qinghao Ye, Xianhan Zeng, Fu Li, Chunyuan Li, Haoqi FanMar 10, 202522

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis

Jonas Belouadi, Eddy Ilg, Margret Keuper, Hideki Tanaka, Masao Utiyama, Raj Dabre, Steffen Eger, Simone Paolo PonzettoMar 14, 202512

GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving

William Ljungbergh, Adam Lilja, Adam Tonderski. Arvid Laveno Ling, Carl Lindström, Willem Verbeke, Junsheng Fu, Christoffer Petersson, Lars Hammarstrand, Michael FelsbergMar 19, 202502

Why Personalizing Deep Learning-Based Code Completion Tools Matters

Alessandro Giagnorio, Alberto Martin-Lopez, Gabriele BavotaMar 18, 202502