VARCO-VISION:拓展韓國視覺語言模型的新領域

VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models

November 28, 2024
作者: Jeongho Ju, Daeyoung Kim, SunYoung Park, Youngjune Kim
cs.AI

摘要

本文介紹了一個開源的韓英視覺語言模型(VLM),名為VARCO-VISION。我們採用了一種逐步訓練策略,使模型能夠同時學習語言和視覺信息,同時保留骨幹模型的知識。相較於相似大小的模型,我們的模型在需要雙語圖像文本理解和生成能力的多樣情境中表現出色。VARCO-VISION還能夠進行基於場景的、參考性的和OCR等任務,擴展了其在現實場景中的應用和潛在應用。除了模型之外,我們釋出了五個韓文評估數據集,包括四個閉集和一個開放集的基準測試。我們預期這一里程碑將擴大AI研究人員培訓VLM的機會。VARCO-VISION可在https://huggingface.co/NCSOFT/VARCO-VISION-14B找到。
English
In this paper, we introduce an open-source Korean-English vision-language model (VLM), VARCO-VISION. We incorporate a step-by-step training strategy that allows a model learn both linguistic and visual information while preserving the backbone model's knowledge. Our model demonstrates outstanding performance in diverse settings requiring bilingual image-text understanding and generation abilities compared to models of similar size. VARCO-VISION is also capable of grounding, referring, and OCR, expanding its usage and potential applications for real-world scenarios. In addition to the model, we release five Korean evaluation datasets, including four closed-set and one openset benchmarks. We anticipate that our milestone will broaden the opportunities for AI researchers aiming to train VLMs. VARCO-VISION is available at https://huggingface.co/NCSOFT/VARCO-VISION-14B.

Summary

AI-Generated Summary

PDF192December 5, 2024