ChatPaper.aiChatPaper

NRGBoost:基於能量的生成增強樹

NRGBoost: Energy-Based Generative Boosted Trees

October 4, 2024
作者: João Bravo
cs.AI

摘要

儘管深度學習在非結構化數據領域佔主導地位,但基於樹的方法,如隨機森林(RF)和梯度提升決策樹(GBDT),仍然是處理表格數據上的區分任務的主力。我們探索這些熱門算法的生成擴展,重點放在明確地對數據密度進行建模(直到歸一化常數),從而使其能夠應用於除了抽樣之外的其他任務。作為我們的主要貢獻,我們提出了一種基於能量的生成增強算法,類似於流行套件(如XGBoost)中實現的二階增強。我們展示,儘管生成一個能夠處理任何輸入變量的生成模型,我們提出的算法在許多真實世界的表格數據集上可以實現與GBDT相似的區分性能,勝過其他生成方法。同時,我們展示它在抽樣方面也與基於神經網絡的模型具有競爭力。
English
Despite the rise to dominance of deep learning in unstructured data domains, tree-based methods such as Random Forests (RF) and Gradient Boosted Decision Trees (GBDT) are still the workhorses for handling discriminative tasks on tabular data. We explore generative extensions of these popular algorithms with a focus on explicitly modeling the data density (up to a normalization constant), thus enabling other applications besides sampling. As our main contribution we propose an energy-based generative boosting algorithm that is analogous to the second order boosting implemented in popular packages like XGBoost. We show that, despite producing a generative model capable of handling inference tasks over any input variable, our proposed algorithm can achieve similar discriminative performance to GBDT on a number of real world tabular datasets, outperforming alternative generative approaches. At the same time, we show that it is also competitive with neural network based models for sampling.

Summary

AI-Generated Summary

PDF72November 16, 2024