“这并非我的真实写照”：探究合成AI语音服务中的口音偏见与数字排斥现象

摘要

人工智能（AI）语音生成与声音克隆技术的最新进展已能产生自然流畅的语音和精准的声音复制，然而这些技术在不同口音和语言特征的社会技术系统中的影响尚未被充分理解。本研究通过混合方法，结合问卷调查与访谈，评估了两款合成AI语音服务（Speechify和ElevenLabs），旨在衡量其技术性能，并揭示用户的生活经历如何影响他们对这些语音技术中口音变化的感知。我们的研究结果揭示了五种地区性英语口音在技术性能上的差异，并展示了当前语音生成技术可能无意中强化了语言特权与基于口音的歧视，潜在地催生了新型的数字排斥现象。总体而言，本研究通过为开发者、政策制定者及组织提供可操作的见解，强调了包容性设计与监管的必要性，以确保AI语音技术的公平性与社会责任。

English

Recent advances in artificial intelligence (AI) speech generation and voice cloning technologies have produced naturalistic speech and accurate voice replication, yet their influence on sociotechnical systems across diverse accents and linguistic traits is not fully understood. This study evaluates two synthetic AI voice services (Speechify and ElevenLabs) through a mixed methods approach using surveys and interviews to assess technical performance and uncover how users' lived experiences influence their perceptions of accent variations in these speech technologies. Our findings reveal technical performance disparities across five regional, English-language accents and demonstrate how current speech generation technologies may inadvertently reinforce linguistic privilege and accent-based discrimination, potentially creating new forms of digital exclusion. Overall, our study highlights the need for inclusive design and regulation by providing actionable insights for developers, policymakers, and organizations to ensure equitable and socially responsible AI speech technologies.

“这并非我的真实写照”：探究合成AI语音服务中的口音偏见与数字排斥现象

"It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services

摘要

Summary

Support

Support