IPBench:大型语言模型在知识产权领域知识能力的基准测试
IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property
April 22, 2025
作者: Qiyao Wang, Guhong Chen, Hongbo Wang, Huaren Liu, Minghui Zhu, Zhifei Qin, Linwei Li, Yilin Yue, Shiqiang Wang, Jiayan Li, Yihang Wu, Ziqiang Liu, Longze Chen, Run Luo, Liyang Fan, Jiaming Li, Lei Zhang, Kan Xu, Hongfei Lin, Hamid Alinejad-Rokny, Shiwen Ni, Yuan Lin, Min Yang
cs.AI
摘要
知识产权(IP)是一个融合技术与法律知识的独特领域,其复杂性和知识密集性不言而喻。随着大语言模型(LLMs)的持续进步,它们在处理知识产权任务方面展现出巨大潜力,能够更高效地分析、理解并生成与知识产权相关的内容。然而,现有数据集和基准要么仅聚焦于专利,要么覆盖知识产权领域的有限方面,与现实场景缺乏契合。为填补这一空白,我们首次提出了全面的知识产权任务分类体系,并构建了一个大规模、多样化的双语基准——IPBench,涵盖8种知识产权机制和20项任务。该基准旨在评估大语言模型在现实世界知识产权应用中的表现,包括理解和生成两方面。我们对16个大语言模型进行了基准测试,从通用模型到领域专用模型均有涉及,发现即使表现最佳的模型准确率也仅为75.8%,显示出显著的改进空间。值得注意的是,开源的知识产权和法律导向模型落后于闭源的通用模型。我们公开了IPBench的所有数据和代码,并将持续更新更多与知识产权相关的任务,以更好地反映知识产权领域的现实挑战。
English
Intellectual Property (IP) is a unique domain that integrates technical and
legal knowledge, making it inherently complex and knowledge-intensive. As large
language models (LLMs) continue to advance, they show great potential for
processing IP tasks, enabling more efficient analysis, understanding, and
generation of IP-related content. However, existing datasets and benchmarks
either focus narrowly on patents or cover limited aspects of the IP field,
lacking alignment with real-world scenarios. To bridge this gap, we introduce
the first comprehensive IP task taxonomy and a large, diverse bilingual
benchmark, IPBench, covering 8 IP mechanisms and 20 tasks. This benchmark is
designed to evaluate LLMs in real-world intellectual property applications,
encompassing both understanding and generation. We benchmark 16 LLMs, ranging
from general-purpose to domain-specific models, and find that even the
best-performing model achieves only 75.8% accuracy, revealing substantial room
for improvement. Notably, open-source IP and law-oriented models lag behind
closed-source general-purpose models. We publicly release all data and code of
IPBench and will continue to update it with additional IP-related tasks to
better reflect real-world challenges in the intellectual property domain.Summary
AI-Generated Summary