ChatPaper.aiChatPaper

依据韩国教育标准评估多模态生成式人工智能

Evaluating Multimodal Generative AI with Korean Educational Standards

February 21, 2025
作者: Sanghee Park, Geewook Kim
cs.AI

摘要

本文介绍了韩国国家教育测试基准(KoNET),这是一个旨在利用韩国国家教育考试评估多模态生成式人工智能系统的新基准。KoNET包含四项考试:韩国小学综合教育发展测试(KoEGED)、初中(KoMGED)、高中(KoHGED)以及大学修学能力测试(KoCSAT)。这些考试以其严格的标准和多样化的问题著称,有助于全面分析AI在不同教育水平上的表现。通过聚焦于韩语,KoNET为探索较少研究语言中的模型性能提供了洞见。我们评估了一系列模型——开源、开放访问和封闭API——通过考察难度、科目多样性及人类错误率。代码和数据集构建工具将完全开源,地址为https://github.com/naver-ai/KoNET。
English
This paper presents the Korean National Educational Test Benchmark (KoNET), a new benchmark designed to evaluate Multimodal Generative AI Systems using Korean national educational tests. KoNET comprises four exams: the Korean Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High (KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are renowned for their rigorous standards and diverse questions, facilitating a comprehensive analysis of AI performance across different educational levels. By focusing on Korean, KoNET provides insights into model performance in less-explored languages. We assess a range of models - open-source, open-access, and closed APIs - by examining difficulties, subject diversity, and human error rates. The code and dataset builder will be made fully open-sourced at https://github.com/naver-ai/KoNET.

Summary

AI-Generated Summary

PDF93February 24, 2025