Molar: 협력 필터링 정렬을 활용한 다중 모달 LLMs를 통한 향상된 순차 추천

초록

지난 10년 동안 순차 추천 (SR) 시스템은 전통적인 협업 필터링에서 심층 학습 방법으로, 그리고 더 최근에는 대형 언어 모델 (LLM)로 진화해왔습니다. LLM의 도입으로 상당한 발전이 이루어졌지만, 이러한 모델은 협업 필터링 정보가 부족하며 주로 텍스트 콘텐츠 데이터에 의존하여 다른 모드를 무시하고 최적의 추천 성능을 달성하지 못합니다. 이 한계를 해결하기 위해 우리는 Molar이라는 다중 콘텐츠 모드와 ID 정보를 효과적으로 포착하기 위해 협업 신호를 통합하는 대규모 언어 순차 추천 프레임워크를 제안합니다. Molar은 텍스트 및 비텍스트 데이터에서 통합된 항목 표현을 생성하기 위해 MLLM을 사용하여 포괄적인 다중 모달 모델링을 용이하게 하고 항목 임베딩을 풍부하게 합니다. 또한 콘텐츠 기반 및 ID 기반 모델에서 사용자 표현을 조정하는 후방 정렬 메커니즘을 통해 협업 필터링 신호를 통합하여 정확한 개인화와 견고한 성능을 보장합니다. 다중 모달 콘텐츠를 협업 필터링 통찰력과 원활하게 결합함으로써 Molar은 사용자 관심사와 문맥 의미를 모두 포착하여 우수한 추천 정확도를 제공합니다. 광범위한 실험을 통해 Molar이 전통적인 LLM 기반 기준선을 크게 능가함을 검증하며, 다중 모달 데이터 및 협업 신호를 활용하는 능력을 강조합니다. 소스 코드는 https://anonymous.4open.science/r/Molar-8B06/에서 확인할 수 있습니다.

English

Sequential recommendation (SR) systems have evolved significantly over the past decade, transitioning from traditional collaborative filtering to deep learning approaches and, more recently, to large language models (LLMs). While the adoption of LLMs has driven substantial advancements, these models inherently lack collaborative filtering information, relying primarily on textual content data neglecting other modalities and thus failing to achieve optimal recommendation performance. To address this limitation, we propose Molar, a Multimodal large language sequential recommendation framework that integrates multiple content modalities with ID information to capture collaborative signals effectively. Molar employs an MLLM to generate unified item representations from both textual and non-textual data, facilitating comprehensive multimodal modeling and enriching item embeddings. Additionally, it incorporates collaborative filtering signals through a post-alignment mechanism, which aligns user representations from content-based and ID-based models, ensuring precise personalization and robust performance. By seamlessly combining multimodal content with collaborative filtering insights, Molar captures both user interests and contextual semantics, leading to superior recommendation accuracy. Extensive experiments validate that Molar significantly outperforms traditional and LLM-based baselines, highlighting its strength in utilizing multimodal data and collaborative signals for sequential recommendation tasks. The source code is available at https://anonymous.4open.science/r/Molar-8B06/.

Molar: 협력 필터링 정렬을 활용한 다중 모달 LLMs를 통한 향상된 순차 추천

Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation

초록

Summary

Support

Support