多模态交响曲：通过生成式AI融合味觉与听觉

摘要

近几十年来，神经科学与心理学研究揭示了味觉与听觉感知之间的直接关联。本文基于这一基础研究，探索了能够将味觉信息转化为音乐的多模态生成模型。我们简要回顾了该领域的最新进展，重点介绍了关键发现与方法论。我们进行了一项实验，其中使用了一个经过微调的音乐生成模型（MusicGEN），根据为每首音乐作品提供的详细味觉描述来生成音乐。实验结果令人鼓舞：根据参与者（n=111）的评估，经过微调的模型生成的音乐比未微调的模型更能连贯地反映输入的味觉描述。这项研究代表了在理解与开发人工智能、声音和味觉之间具身交互方面迈出的重要一步，为生成式人工智能领域开辟了新的可能性。我们已在以下网址发布了我们的数据集、代码与预训练模型：https://osf.io/xs5jy/。

English

In recent decades, neuroscientific and psychological research has traced direct relationships between taste and auditory perceptions. This article explores multimodal generative models capable of converting taste information into music, building on this foundational research. We provide a brief review of the state of the art in this field, highlighting key findings and methodologies. We present an experiment in which a fine-tuned version of a generative music model (MusicGEN) is used to generate music based on detailed taste descriptions provided for each musical piece. The results are promising: according the participants' (n=111) evaluation, the fine-tuned model produces music that more coherently reflects the input taste descriptions compared to the non-fine-tuned model. This study represents a significant step towards understanding and developing embodied interactions between AI, sound, and taste, opening new possibilities in the field of generative AI. We release our dataset, code and pre-trained model at: https://osf.io/xs5jy/.

多模态交响曲：通过生成式AI融合味觉与听觉

A Multimodal Symphony: Integrating Taste and Sound through Generative AI

摘要

Summary

Support

Support