시간은 그 자리를 갖는가? 시간적 헤드: 언어 모델이 시간 특정 정보를 기억하는 곳

초록

언어 모델이 사실을 도출하는 능력은 널리 연구되어 왔지만, 시간에 따라 변화하는 사실을 어떻게 처리하는지는 아직 충분히 탐구되지 않았습니다. 우리는 회로 분석을 통해 시간적 지식을 주로 처리하는 특정 어텐션 헤드인 'Temporal Heads'를 발견했습니다. 이러한 헤드가 여러 모델에 걸쳐 존재하지만, 그 구체적인 위치는 다를 수 있으며, 지식의 유형과 해당 연도에 따라 반응이 달라짐을 확인했습니다. 이러한 헤드를 비활성화하면 모델의 시간 특정 지식을 회상하는 능력이 저하되지만, 시간 불변적 특성과 질문 응답 성능은 유지됩니다. 또한, 이 헤드들은 숫자 조건("2004년에")뿐만 아니라 텍스트 별칭("...년에")에서도 활성화되어, 단순한 숫자 표현을 넘어 시간적 차원을 인코딩함을 나타냅니다. 더 나아가, 우리는 이러한 헤드의 값을 조정함으로써 시간적 지식을 편집할 수 있는 가능성을 보여줌으로써 연구 결과의 잠재력을 확장했습니다.

English

While the ability of language models to elicit facts has been widely investigated, how they handle temporally changing facts remains underexplored. We discover Temporal Heads, specific attention heads primarily responsible for processing temporal knowledge through circuit analysis. We confirm that these heads are present across multiple models, though their specific locations may vary, and their responses differ depending on the type of knowledge and its corresponding years. Disabling these heads degrades the model's ability to recall time-specific knowledge while maintaining its general capabilities without compromising time-invariant and question-answering performances. Moreover, the heads are activated not only numeric conditions ("In 2004") but also textual aliases ("In the year ..."), indicating that they encode a temporal dimension beyond simple numerical representation. Furthermore, we expand the potential of our findings by demonstrating how temporal knowledge can be edited by adjusting the values of these heads.

시간은 그 자리를 갖는가? 시간적 헤드: 언어 모델이 시간 특정 정보를 기억하는 곳

Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

초록

Support