PodAgent：一个全面的播客生成框架

摘要

现有的自动音频生成方法在有效制作类似播客的音频节目方面面临挑战，主要难点在于深度内容生成以及恰当且富有表现力的语音生成。本文提出了PodAgent，一个用于创建音频节目的综合框架。PodAgent通过以下方式实现：1) 设计了一个主持人-嘉宾-撰稿人多智能体协作系统，生成信息丰富的主题讨论内容；2) 构建语音池，实现合适的语音角色匹配；3) 利用LLM增强的语音合成方法，生成富有表现力的对话语音。鉴于缺乏针对播客类音频生成的标准评估准则，我们制定了全面的评估指南，以有效评估模型性能。实验结果表明，PodAgent在主题讨论对话内容生成上显著优于直接使用GPT-4，语音匹配准确率达到87.4%，并通过LLM引导的合成技术生成更具表现力的语音。演示页面：https://podcast-agent.github.io/demo/。源代码：https://github.com/yujxx/PodAgent。

English

Existing Existing automatic audio generation methods struggle to generate podcast-like audio programs effectively. The key challenges lie in in-depth content generation, appropriate and expressive voice production. This paper proposed PodAgent, a comprehensive framework for creating audio programs. PodAgent 1) generates informative topic-discussion content by designing a Host-Guest-Writer multi-agent collaboration system, 2) builds a voice pool for suitable voice-role matching and 3) utilizes LLM-enhanced speech synthesis method to generate expressive conversational speech. Given the absence of standardized evaluation criteria for podcast-like audio generation, we developed comprehensive assessment guidelines to effectively evaluate the model's performance. Experimental results demonstrate PodAgent's effectiveness, significantly surpassing direct GPT-4 generation in topic-discussion dialogue content, achieving an 87.4% voice-matching accuracy, and producing more expressive speech through LLM-guided synthesis. Demo page: https://podcast-agent.github.io/demo/. Source code: https://github.com/yujxx/PodAgent.

PodAgent：一个全面的播客生成框架

PodAgent: A Comprehensive Framework for Podcast Generation

摘要

Summary

Support

Support