台风 T1：一个开放的泰国推理模型

摘要

本文介绍了台风T1，这是一个开放的项目，旨在开发一个开放的泰语推理模型。推理模型是一种相对较新的生成模型，建立在大型语言模型（LLMs）之上。推理模型在最终给出答案之前会生成一长串思考过程，这种方法被发现能够提高处理复杂任务的性能。然而，有关开发这种模型的细节很有限，特别是对于能够在低资源语言中生成追踪的推理模型。台风T1提出了一个开放的项目，深入探讨了以更具成本效益的方式开发推理模型的细节，通过利用开放数据集进行监督微调，而非强化学习。本文分享了关于合成数据生成和训练的细节，以及我们的数据集和模型权重。此外，我们提供了从开发一个能够在领域间泛化并能够在低资源语言中生成推理追踪的推理模型中获得的见解，以泰语为例。我们希望这一开放项目为该领域的进一步研究奠定基础。

English

This paper introduces Typhoon T1, an open effort to develop an open Thai reasoning model. A reasoning model is a relatively new type of generative model built on top of large language models (LLMs). A reasoning model generates a long chain of thought before arriving at a final answer, an approach found to improve performance on complex tasks. However, details on developing such a model are limited, especially for reasoning models that can generate traces in a low-resource language. Typhoon T1 presents an open effort that dives into the details of developing a reasoning model in a more cost-effective way by leveraging supervised fine-tuning using open datasets, instead of reinforcement learning. This paper shares the details about synthetic data generation and training, as well as our dataset and model weights. Additionally, we provide insights gained from developing a reasoning model that generalizes across domains and is capable of generating reasoning traces in a low-resource language, using Thai as an example. We hope this open effort provides a foundation for further research in this field.

台风 T1：一个开放的泰国推理模型

Typhoon T1: An Open Thai Reasoning Model

摘要

Summary

Support