依据韩国教育标准评估多模态生成式人工智能
Evaluating Multimodal Generative AI with Korean Educational Standards
February 21, 2025
作者: Sanghee Park, Geewook Kim
cs.AI
摘要
本文介绍了韩国国家教育测试基准(KoNET),这是一个旨在利用韩国国家教育考试评估多模态生成式人工智能系统的新基准。KoNET包含四项考试:韩国小学综合教育发展测试(KoEGED)、初中(KoMGED)、高中(KoHGED)以及大学修学能力测试(KoCSAT)。这些考试以其严格的标准和多样化的问题著称,有助于全面分析AI在不同教育水平上的表现。通过聚焦于韩语,KoNET为探索较少研究语言中的模型性能提供了洞见。我们评估了一系列模型——开源、开放访问和封闭API——通过考察难度、科目多样性及人类错误率。代码和数据集构建工具将完全开源,地址为https://github.com/naver-ai/KoNET。
English
This paper presents the Korean National Educational Test Benchmark (KoNET), a
new benchmark designed to evaluate Multimodal Generative AI Systems using
Korean national educational tests. KoNET comprises four exams: the Korean
Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High
(KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are
renowned for their rigorous standards and diverse questions, facilitating a
comprehensive analysis of AI performance across different educational levels.
By focusing on Korean, KoNET provides insights into model performance in
less-explored languages. We assess a range of models - open-source,
open-access, and closed APIs - by examining difficulties, subject diversity,
and human error rates. The code and dataset builder will be made fully
open-sourced at https://github.com/naver-ai/KoNET.Summary
AI-Generated Summary