ProSA: LLM의 프롬프트 민감도를 평가하고 이해하기

초록

대형 언어 모델(LLMs)은 다양한 작업에서 인상적인 성능을 보여주었지만, 그 성능은 사용된 프롬프트에 매우 민감합니다. 이러한 변동성은 정확한 평가와 사용자 만족에 도전을 제기합니다. 현재 연구는 종종 인스턴스 수준의 프롬프트 변화와 주관적 평가에 미치는 영향을 간과합니다. 이러한 결점을 해결하기 위해 우리는 LLMs에서 프롬프트 민감도를 평가하고 이해하기 위해 설계된 ProSA를 소개합니다. ProSA는 새로운 민감도 측정 항목인 PromptSensiScore를 통합하고 디코딩 신뢰도를 활용하여 근본적인 메커니즘을 명확히 합니다. 다양한 작업에 걸쳐 이루어진 우리의 포괄적인 연구는 프롬프트 민감도가 데이터셋 및 모델에 따라 변동되며, 대형 모델이 향상된 견고성을 나타낸다는 것을 밝혀냅니다. 우리는 소수의 샷 예제가 이러한 민감도 문제를 완화시킬 수 있으며, 주관적 평가도 특히 복잡한 추론 중심 작업에서 프롬프트 민감도의 영향을 받는다는 것을 관찰합니다. 더 나아가, 우리의 결과는 더 높은 모델 신뢰도가 증가된 프롬프트 견고성과 관련이 있다는 것을 보여줍니다. 우리는 이 연구가 LLMs의 프롬프트 민감도 연구에 도움이 되는 도구로 작용할 것으로 믿습니다. 해당 프로젝트는 다음에서 확인할 수 있습니다: https://github.com/open-compass/ProSA .

English

Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but their performance is highly sensitive to the prompts utilized. This variability poses challenges for accurate assessment and user satisfaction. Current research frequently overlooks instance-level prompt variations and their implications on subjective evaluations. To address these shortcomings, we introduce ProSA, a framework designed to evaluate and comprehend prompt sensitivity in LLMs. ProSA incorporates a novel sensitivity metric, PromptSensiScore, and leverages decoding confidence to elucidate underlying mechanisms. Our extensive study, spanning multiple tasks, uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness. We observe that few-shot examples can alleviate this sensitivity issue, and subjective evaluations are also susceptible to prompt sensitivities, particularly in complex, reasoning-oriented tasks. Furthermore, our findings indicate that higher model confidence correlates with increased prompt robustness. We believe this work will serve as a helpful tool in studying prompt sensitivity of LLMs. The project is released at: https://github.com/open-compass/ProSA .

ProSA: LLM의 프롬프트 민감도를 평가하고 이해하기

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

초록

Support