AutoTrain: Training von State-of-the-Art-Modellen ohne Code

Zusammenfassung

Dank der Fortschritte bei Open-Source-Modellen ist das Training (oder Feintuning) von Modellen auf benutzerdefinierten Datensätzen zu einem entscheidenden Bestandteil der Entwicklung von Lösungen geworden, die speziell auf bestimmte industrielle oder Open-Source-Anwendungen zugeschnitten sind. Dennoch gibt es kein einzelnes Tool, das den Schulungsprozess über verschiedene Arten von Modalitäten oder Aufgaben hinweg vereinfacht. Wir stellen AutoTrain (auch bekannt als AutoTrain Advanced) vor - ein Open-Source-Tool/Bibliothek ohne Code, das zur Schulung (oder zum Feintuning) von Modellen für verschiedene Arten von Aufgaben wie: Feintuning großer Sprachmodelle (LLM), Textklassifizierung/-regression, Tokenklassifizierung, Sequenz-zu-Sequenz-Aufgaben, Feintuning von Satz-Transformern, Feintuning von visuellen Sprachmodellen (VLM), Bildklassifizierung/-regression und sogar Klassifizierungs- und Regressionsaufgaben auf tabellarischen Daten verwendet werden kann. AutoTrain Advanced ist eine Open-Source-Bibliothek, die bewährte Verfahren für das Training von Modellen auf benutzerdefinierten Datensätzen bereitstellt. Die Bibliothek ist unter https://github.com/huggingface/autotrain-advanced verfügbar. AutoTrain kann im vollständig lokalen Modus oder auf Cloud-Maschinen verwendet werden und funktioniert mit Zehntausenden von Modellen, die im Hugging Face Hub geteilt werden, sowie deren Varianten.

English

With the advancements in open-source models, training (or finetuning) models on custom datasets has become a crucial part of developing solutions which are tailored to specific industrial or open-source applications. Yet, there is no single tool which simplifies the process of training across different types of modalities or tasks. We introduce AutoTrain (aka AutoTrain Advanced) -- an open-source, no code tool/library which can be used to train (or finetune) models for different kinds of tasks such as: large language model (LLM) finetuning, text classification/regression, token classification, sequence-to-sequence task, finetuning of sentence transformers, visual language model (VLM) finetuning, image classification/regression and even classification and regression tasks on tabular data. AutoTrain Advanced is an open-source library providing best practices for training models on custom datasets. The library is available at https://github.com/huggingface/autotrain-advanced. AutoTrain can be used in fully local mode or on cloud machines and works with tens of thousands of models shared on Hugging Face Hub and their variations.

AutoTrain: Training von State-of-the-Art-Modellen ohne Code

AutoTrain: No-code training for state-of-the-art models

Zusammenfassung

Summary

Support