Molar:具有協同過濾對齊的多模態LLM用於增強的序列推薦

Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation

December 24, 2024
作者: Yucong Luo, Qitao Qin, Hao Zhang, Mingyue Cheng, Ruiran Yan, Kefan Wang, Jie Ouyang
cs.AI

摘要

近十年來,序列推薦(SR)系統已經有了顯著的發展,從傳統的協同過濾轉向深度學習方法,最近又發展到大型語言模型(LLMs)。儘管LLMs的應用推動了重大進展,這些模型固有地缺乏協同過濾信息,主要依賴文本內容數據,忽略了其他模態,因此無法實現最佳的推薦性能。為了解決這一限制,我們提出了Molar,一個多模態大型語言序列推薦框架,它將多個內容模態與ID信息相結合,有效捕捉協同信號。Molar採用MLLM生成統一的物品表示,從文本和非文本數據中,促進全面的多模態建模,豐富物品嵌入。此外,它通過後對齊機制納入協同過濾信號,對齊基於內容和基於ID模型的用戶表示,確保精確的個性化和穩健的性能。通過無縫結合多模態內容和協同過濾見解,Molar捕捉了用戶興趣和上下文語義,從而提高了推薦準確性。大量實驗驗證了Molar明顯優於傳統和基於LLM的基線,突顯了其在利用多模態數據和協同信號進行序列推薦任務方面的優勢。源代碼可在https://anonymous.4open.science/r/Molar-8B06/找到。
English
Sequential recommendation (SR) systems have evolved significantly over the past decade, transitioning from traditional collaborative filtering to deep learning approaches and, more recently, to large language models (LLMs). While the adoption of LLMs has driven substantial advancements, these models inherently lack collaborative filtering information, relying primarily on textual content data neglecting other modalities and thus failing to achieve optimal recommendation performance. To address this limitation, we propose Molar, a Multimodal large language sequential recommendation framework that integrates multiple content modalities with ID information to capture collaborative signals effectively. Molar employs an MLLM to generate unified item representations from both textual and non-textual data, facilitating comprehensive multimodal modeling and enriching item embeddings. Additionally, it incorporates collaborative filtering signals through a post-alignment mechanism, which aligns user representations from content-based and ID-based models, ensuring precise personalization and robust performance. By seamlessly combining multimodal content with collaborative filtering insights, Molar captures both user interests and contextual semantics, leading to superior recommendation accuracy. Extensive experiments validate that Molar significantly outperforms traditional and LLM-based baselines, highlighting its strength in utilizing multimodal data and collaborative signals for sequential recommendation tasks. The source code is available at https://anonymous.4open.science/r/Molar-8B06/.

Summary

AI-Generated Summary

PDF152December 27, 2024