상호 작용으로부터의 회고적 학습

초록

대규모 언어 모델(LLMs)과 사용자 간의 다중 턴 상호작용은 자연스럽게 암시적 피드백 신호를 포함합니다. LLM이 지시에 예상치 못한 방식으로 응답하면 사용자는 요청을 다시 정제하거나 불만을 표현하거나 대체 작업으로 전환함으로써 신호를 줄 가능성이 높습니다. 이러한 신호들은 작업에 독립적이며 비국소적이며, 상대적으로 제한된 언어 부분 공간을 차지하므로 LLM은 실제 작업에 실패하더라도 이를 식별할 수 있습니다. 이는 추가 주석 없이 상호작용으로부터 지속적으로 학습할 수 있는 방법을 제공합니다. 우리는 ReSpect를 소개합니다. 이는 과거 상호작용에서 이러한 신호로부터 학습하는 방법론입니다. 우리는 ReSpect를 새로운 다중 모달 상호작용 시나리오에 배치했습니다. 여기서 사람들은 LLM에게 조합적 해결 공간을 갖는 추상적 추론 작업을 해결하도록 지시합니다. 수천 건의 인간과의 상호작용을 통해, ReSpect가 외부 주석 없이도 작업 완료율을 31%에서 82%로 점진적으로 향상시키는 방법을 보여줍니다.

English

Multi-turn interactions between large language models (LLMs) and users naturally include implicit feedback signals. If an LLM responds in an unexpected way to an instruction, the user is likely to signal it by rephrasing the request, expressing frustration, or pivoting to an alternative task. Such signals are task-independent and occupy a relatively constrained subspace of language, allowing the LLM to identify them even if it fails on the actual task. This creates an avenue for continually learning from interactions without additional annotations. We introduce ReSpect, a method to learn from such signals in past interactions via retrospection. We deploy ReSpect in a new multimodal interaction scenario, where humans instruct an LLM to solve an abstract reasoning task with a combinatorial solution space. Through thousands of interactions with humans, we show how ReSpect gradually improves task completion rate from 31% to 82%, all without any external annotation.

상호 작용으로부터의 회고적 학습

Retrospective Learning from Interactions

초록

Summary

Support