科普羅姆普特:知識增強提示用於科學主題的細粒度分類。
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics
October 2, 2024
作者: Zhiwen You, Kanyao Han, Haotian Zhu, Bertram Ludäscher, Jana Diesner
cs.AI
摘要
基於提示的微調已成為引發預先訓練語言模型中編碼的資訊的重要方法,適用於各種任務,包括文本分類。對於多類別分類任務,在低資源情況下進行基於提示的微調已導致性能水平與完全微調方法相當。先前的研究使用精心設計的提示模板和語言化器,從標籤術語空間映射到類別空間,將分類問題解決為遮罩語言建模任務。然而,跨領域和細粒度的基於提示的微調與自動增強的語言化器仍未被探索,主要是因為手動選擇領域標籤術語以用於語言化器的困難和成本,需要具有領域專業知識的人類。為應對這一挑戰,我們引入了SciPrompt,一個旨在自動檢索科學主題相關術語以進行低資源文本分類任務的框架。為此,我們在科學文獻的上下文中選擇語義相關且特定於領域的標籤術語,以進行語言化器增強。此外,我們提出一種新的語言化策略,使用相關分數作為額外權重,以增強語言模型在模型微調期間的預測性能。我們的方法在科學文本分類任務中的少量和零-shot設置下優於最先進的基於提示的微調方法,特別是在對細粒度和新興科學主題進行分類時。
English
Prompt-based fine-tuning has become an essential method for eliciting
information encoded in pre-trained language models for a variety of tasks,
including text classification. For multi-class classification tasks,
prompt-based fine-tuning under low-resource scenarios has resulted in
performance levels comparable to those of fully fine-tuning methods. Previous
studies have used crafted prompt templates and verbalizers, mapping from the
label terms space to the class space, to solve the classification problem as a
masked language modeling task. However, cross-domain and fine-grained
prompt-based fine-tuning with an automatically enriched verbalizer remains
unexplored, mainly due to the difficulty and costs of manually selecting domain
label terms for the verbalizer, which requires humans with domain expertise. To
address this challenge, we introduce SciPrompt, a framework designed to
automatically retrieve scientific topic-related terms for low-resource text
classification tasks. To this end, we select semantically correlated and
domain-specific label terms within the context of scientific literature for
verbalizer augmentation. Furthermore, we propose a new verbalization strategy
that uses correlation scores as additional weights to enhance the prediction
performance of the language model during model tuning. Our method outperforms
state-of-the-art, prompt-based fine-tuning methods on scientific text
classification tasks under few and zero-shot settings, especially in
classifying fine-grained and emerging scientific topics.Summary
AI-Generated Summary