可执行功能抽象：面向高等数学问题的生成式程序推断

摘要

科学家们常从具体问题实例中提炼出抽象程序，并利用这些抽象生成新的相关实例。例如，编码系统正式规则与属性的程序已在多个领域发挥效用，从强化学习（程序化环境）到物理学（仿真引擎）。这些程序可视为根据参数化（如网格世界配置或初始物理条件）执行不同输出的函数。我们引入“可执行功能抽象”（EFA）这一术语，用以指代数学问题中的此类程序。类似EFA的结构已被证明在数学推理中作为问题生成器对模型进行压力测试时十分有用。然而，先前工作仅限于为小学级别数学（其简单规则易于程序编码）构建抽象，而生成高级数学的EFA至今仍需人工设计。我们探索了自动构建高级数学问题EFA的方法，将这一任务形式化为程序合成任务，并开发了EFAGen。EFAGen基于一个种子数学问题及其逐步解答，利用大语言模型（LLM）生成候选EFA程序，这些程序忠实于种子问题背后的广义问题及解答类别。此外，我们通过可执行的单元测试形式化任何有效EFA必须具备的属性，并展示了如何将这些测试作为可验证的奖励，用于训练LLM成为更优秀的EFA编写者。我们通过实验证明，EFAGen构建的EFA在保持对种子问题忠实性的同时，能产生可学习的问题变体，且EFAGen能够从多种竞赛级数学问题来源中推断出EFA。最后，我们展示了模型编写的EFA在下游应用中的用途，例如发现对学习者而言更难或更易的问题变体，以及数据生成。

English

Scientists often infer abstract procedures from specific instances of problems and use the abstractions to generate new, related instances. For example, programs encoding the formal rules and properties of a system have been useful in fields ranging from RL (procedural environments) to physics (simulation engines). These programs can be seen as functions which execute to different outputs based on their parameterizations (e.g., gridworld configuration or initial physical conditions). We introduce the term EFA (Executable Functional Abstraction) to denote such programs for math problems. EFA-like constructs have been shown to be useful for math reasoning as problem generators for stress-testing models. However, prior work has been limited to abstractions for grade-school math (whose simple rules are easy to encode in programs), while generating EFAs for advanced math has thus far required human engineering. We explore the automatic construction of EFAs for advanced math problems. We operationalize the task of automatically constructing EFAs as a program synthesis task, and develop EFAGen, which conditions an LLM on a seed math problem and its step-by-step solution to generate candidate EFA programs that are faithful to the generalized problem and solution class underlying the seed problem. Furthermore, we formalize properties any valid EFA must possess in terms of executable unit tests, and show how the tests can be used as verifiable rewards to train LLMs to become better writers of EFAs. We demonstrate that EFAs constructed by EFAGen behave rationally by remaining faithful to seed problems, produce learnable problem variations, and that EFAGen can infer EFAs across multiple diverse sources of competition-level math problems. Finally, we show downstream uses of model-written EFAs e.g. finding problem variations that are harder or easier for a learner to solve, as well as data generation.

可执行功能抽象：面向高等数学问题的生成式程序推断

Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems

摘要

Summary

Support

Support