可執行的功能抽象：推斷高階數學問題的生成式程式

摘要

科學家們常常從具體問題的實例中推導出抽象的程序，並利用這些抽象來生成新的相關實例。例如，編碼系統正式規則和屬性的程序，在從強化學習（程序化環境）到物理學（模擬引擎）等多個領域中都有其應用價值。這些程序可視為函數，根據其參數化（如網格世界配置或初始物理條件）執行以產生不同的輸出。我們引入“可執行功能抽象”（EFA）這一術語來指代此類針對數學問題的程序。類似EFA的結構已被證明在數學推理中作為問題生成器用於壓力測試模型方面具有實用性。然而，先前的工作僅限於小學數學的抽象（其簡單規則易於編碼成程序），而生成高級數學的EFA至今仍需人工設計。我們探索了高級數學問題EFA的自動構建。我們將自動構建EFA的任務操作化為程序合成任務，並開發了EFAGen，它基於一個種子數學問題及其逐步解決方案來條件化大型語言模型（LLM），以生成忠實於種子問題背後廣義問題及解決方案類別的候選EFA程序。此外，我們以可執行單元測試的形式形式化了任何有效EFA必須具備的屬性，並展示了如何利用這些測試作為可驗證的獎勵來訓練LLM，使其成為更優秀的EFA編寫者。我們證明，EFAGen構建的EFA行為合理，既忠實於種子問題，又能產生可學習的問題變體，且EFAGen能夠從多樣化的競賽級數學問題來源中推斷出EFA。最後，我們展示了模型編寫的EFA的下游應用，例如發現對學習者而言更難或更易解決的問題變體，以及數據生成。

English

Scientists often infer abstract procedures from specific instances of problems and use the abstractions to generate new, related instances. For example, programs encoding the formal rules and properties of a system have been useful in fields ranging from RL (procedural environments) to physics (simulation engines). These programs can be seen as functions which execute to different outputs based on their parameterizations (e.g., gridworld configuration or initial physical conditions). We introduce the term EFA (Executable Functional Abstraction) to denote such programs for math problems. EFA-like constructs have been shown to be useful for math reasoning as problem generators for stress-testing models. However, prior work has been limited to abstractions for grade-school math (whose simple rules are easy to encode in programs), while generating EFAs for advanced math has thus far required human engineering. We explore the automatic construction of EFAs for advanced math problems. We operationalize the task of automatically constructing EFAs as a program synthesis task, and develop EFAGen, which conditions an LLM on a seed math problem and its step-by-step solution to generate candidate EFA programs that are faithful to the generalized problem and solution class underlying the seed problem. Furthermore, we formalize properties any valid EFA must possess in terms of executable unit tests, and show how the tests can be used as verifiable rewards to train LLMs to become better writers of EFAs. We demonstrate that EFAs constructed by EFAGen behave rationally by remaining faithful to seed problems, produce learnable problem variations, and that EFAGen can infer EFAs across multiple diverse sources of competition-level math problems. Finally, we show downstream uses of model-written EFAs e.g. finding problem variations that are harder or easier for a learner to solve, as well as data generation.

可執行的功能抽象：推斷高階數學問題的生成式程式

Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems

摘要

Summary

Support

Support