언어 모델 학습: 데이터 증강 예측을 위한 데이터셋 학습

초록

본 논문은 일반적으로 기계 학습(ML) 모델을 사용하여 처리되는 분류 작업에 대한 새로운 접근 방식을 소개합니다. 데이터 정제와 특성 엔지니어링에 크게 의존하는 ML 모델과는 달리, 이 방법은 LLMs를 사용하여 프로세스를 간소화합니다. 본 논문은 "데이터-증강 예측 (DAP)"이라는 새로운 방법에 의해 구동되는 "언어 모델 학습 (LML)"이라는 새로운 개념을 제안합니다. 분류는 LLMs에 의해 수행되며, 데이터를 수동으로 탐색하고 이해하고 데이터를 참조로 사용하여 분류를 결정하는 인간과 유사한 방법을 사용합니다. 훈련 데이터는 각 레이블의 분류로 이어지는 특성을 결정하기 위해 요약되고 평가됩니다. DAP 과정에서 시스템은 데이터 요약을 사용하여 쿼리를 자동으로 생성하고, 이를 사용하여 데이터 집합에서 관련 행을 검색합니다. 데이터 요약과 관련 행을 사용하여 LLM에 의해 분류가 생성되어 복잡한 데이터에서도 만족스러운 정확도를 보장합니다. DAP에서 데이터 요약 및 유사한 데이터 사용은 맥락에 맞는 의사 결정을 보장합니다. 제안된 방법은 각 예측의 논리를 검토할 수 있도록 사용자들에게 예측의 해석 가능성을 향상시키기 위해 "해석 가능한 기계 학습 모델로 작동"하는 단어를 사용합니다. 일부 테스트 케이스에서 시스템은 90% 이상의 정확도를 기록하여 시스템의 효과적인 성능과 다양한 시나리오에서 전통적인 ML 모델을 능가할 잠재력을 입증했습니다. 코드는 https://github.com/Pro-GenAI/LML-DAP에서 사용할 수 있습니다.

English

This paper introduces a new approach to using Large Language Models (LLMs) for classification tasks, which are typically handled using Machine Learning (ML) models. Unlike ML models that rely heavily on data cleaning and feature engineering, this method streamlines the process using LLMs. This paper proposes a new concept called "Language Model Learning (LML)" powered by a new method called "Data-Augmented Prediction (DAP)". The classification is performed by LLMs using a method similar to humans manually exploring and understanding the data and deciding classifications using data as a reference. Training data is summarized and evaluated to determine the features that lead to the classification of each label the most. In the process of DAP, the system uses the data summary to automatically create a query, which is used to retrieve relevant rows from the dataset. A classification is generated by the LLM using data summary and relevant rows, ensuring satisfactory accuracy even with complex data. Usage of data summary and similar data in DAP ensures context-aware decision-making. The proposed method uses the words "Act as an Explainable Machine Learning Model" in the prompt to enhance the interpretability of the predictions by allowing users to review the logic behind each prediction. In some test cases, the system scored an accuracy above 90%, proving the effectiveness of the system and its potential to outperform conventional ML models in various scenarios. The code is available at https://github.com/Pro-GenAI/LML-DAP

언어 모델 학습: 데이터 증강 예측을 위한 데이터셋 학습

LML: Language Model Learning a Dataset for Data-Augmented Prediction

초록

Summary

Support

Support