利用STGG+结合主动学习生成π功能分子
Generating π-Functional Molecules Using STGG+ with Active Learning
February 20, 2025
作者: Alexia Jolicoeur-Martineau, Yan Zhang, Boris Knyazev, Aristide Baratin, Cheng-Hao Liu
cs.AI
摘要
生成具有分布外特性的新型分子是分子发现领域的一项重大挑战。尽管监督学习方法能够生成与数据集中相似的高质量分子,但在泛化至分布外特性方面却存在困难。强化学习虽能探索新的化学空间,却常陷入“奖励欺骗”并生成难以合成的分子。本研究通过将先进的监督学习方法STGG+整合至主动学习循环中,有效解决了这一问题。我们的方法通过迭代生成、评估及微调STGG+,持续扩展其知识库,我们将此方法命名为STGG+AL。我们将STGG+AL应用于有机π功能材料的设计,具体针对两项挑战性任务:1)生成以高振子强度为特征的高吸收性分子;2)设计在近红外(NIR)范围内具有合理振子强度的吸收性分子。所生成的分子通过时间依赖密度泛函理论进行了计算机模拟验证与合理化分析。结果表明,相较于强化学习(RL)等现有方法,我们的方法在生成高振子强度新型分子方面极为高效。我们开源了主动学习代码,以及包含290万π共轭分子的Conjugated-xTB数据集,以及基于sTDA-xTB的振子强度与吸收波长近似计算功能。
English
Generating novel molecules with out-of-distribution properties is a major
challenge in molecular discovery. While supervised learning methods generate
high-quality molecules similar to those in a dataset, they struggle to
generalize to out-of-distribution properties. Reinforcement learning can
explore new chemical spaces but often conducts 'reward-hacking' and generates
non-synthesizable molecules. In this work, we address this problem by
integrating a state-of-the-art supervised learning method, STGG+, in an active
learning loop. Our approach iteratively generates, evaluates, and fine-tunes
STGG+ to continuously expand its knowledge. We denote this approach STGG+AL. We
apply STGG+AL to the design of organic pi-functional materials, specifically
two challenging tasks: 1) generating highly absorptive molecules characterized
by high oscillator strength and 2) designing absorptive molecules with
reasonable oscillator strength in the near-infrared (NIR) range. The generated
molecules are validated and rationalized in-silico with time-dependent density
functional theory. Our results demonstrate that our method is highly effective
in generating novel molecules with high oscillator strength, contrary to
existing methods such as reinforcement learning (RL) methods. We open-source
our active-learning code along with our Conjugated-xTB dataset containing 2.9
million pi-conjugated molecules and the function for approximating the
oscillator strength and absorption wavelength (based on sTDA-xTB).Summary
AI-Generated Summary