VARCO-VISION:拓展韩文视觉-语言模型的边界
VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models
November 28, 2024
作者: Jeongho Ju, Daeyoung Kim, SunYoung Park, Youngjune Kim
cs.AI
摘要
本文介绍了一种开源的韩英视觉语言模型(VLM),名为VARCO-VISION。我们采用了一种分阶段训练策略,使模型能够学习语言和视觉信息,同时保留骨干模型的知识。与相似规模的模型相比,我们的模型在需要双语图像文本理解和生成能力的多样化环境中表现出色。VARCO-VISION还能够进行定位、指代和OCR,扩展了其在现实场景中的使用和潜在应用。除了模型之外,我们还发布了五个韩文评估数据集,包括四个封闭集和一个开放集的基准测试。我们期待我们的里程碑将为旨在训练VLM的AI研究人员拓宽机会。VARCO-VISION可在https://huggingface.co/NCSOFT/VARCO-VISION-14B 上获得。
English
In this paper, we introduce an open-source Korean-English vision-language
model (VLM), VARCO-VISION. We incorporate a step-by-step training strategy that
allows a model learn both linguistic and visual information while preserving
the backbone model's knowledge. Our model demonstrates outstanding performance
in diverse settings requiring bilingual image-text understanding and generation
abilities compared to models of similar size. VARCO-VISION is also capable of
grounding, referring, and OCR, expanding its usage and potential applications
for real-world scenarios. In addition to the model, we release five Korean
evaluation datasets, including four closed-set and one openset benchmarks. We
anticipate that our milestone will broaden the opportunities for AI researchers
aiming to train VLMs. VARCO-VISION is available at
https://huggingface.co/NCSOFT/VARCO-VISION-14B.Summary
AI-Generated Summary