AudioBERT: 오디오 지식 증강 언어 모델

초록

최근 연구에서는 텍스트 전용 데이터셋에서 사전 훈련된 언어 모델이 종종 일상 물건의 색상과 같은 기본 시각적 지식이 부족하다는 것을 확인했습니다. 이 관찰에 영감을 받아, 우리는 유사한 결점이 청각 지식에도 존재하는지에 대해 질문합니다. 이 질문에 대답하기 위해, 우리는 청각 지식을 평가하기 위한 두 가지 새로운 작업으로 이루어진 AuditoryBench라는 새 데이터셋을 구축했습니다. 우리의 벤치마크를 사용한 분석 결과, 언어 모델도 심각한 청각 지식 부족을 겪는 것으로 나타났습니다. 이 한계를 해결하기 위해, 우리는 AudioBERT라는 새로운 방법을 제안하여 BERT의 청각 지식을 증대시키는 것을 제안합니다. 먼저, 우리는 질의를 위해 검색 모델을 효율적으로 쿼리하기 위해 프롬프트에서 청각 지식 범위를 감지합니다. 그런 다음, 우리는 BERT에 오디오 지식을 주입하고 오디오 지식이 필요할 때 효과적인 적응을 위해 저랭크 적응을 활성화합니다. 우리의 실험 결과, AudioBERT는 매우 효과적이며 AuditoryBench에서 우수한 성능을 달성했습니다. 데이터셋과 코드는 https://github.com/HJ-Ok/AudioBERT에서 확인할 수 있습니다.

English

Recent studies have identified that language models, pretrained on text-only datasets, often lack elementary visual knowledge, e.g., colors of everyday objects. Motivated by this observation, we ask whether a similar shortcoming exists in terms of the auditory knowledge. To answer this question, we construct a new dataset called AuditoryBench, which consists of two novel tasks for evaluating auditory knowledge. Based on our analysis using the benchmark, we find that language models also suffer from a severe lack of auditory knowledge. To address this limitation, we propose AudioBERT, a novel method to augment the auditory knowledge of BERT through a retrieval-based approach. First, we detect auditory knowledge spans in prompts to query our retrieval model efficiently. Then, we inject audio knowledge into BERT and switch on low-rank adaptation for effective adaptation when audio knowledge is required. Our experiments demonstrate that AudioBERT is quite effective, achieving superior performance on the AuditoryBench. The dataset and code are available at https://github.com/HJ-Ok/AudioBERT.

AudioBERT: 오디오 지식 증강 언어 모델

AudioBERT: Audio Knowledge Augmented Language Model

초록

Summary

Support

Support