NBFC Platform

Project:

Back Edit Delete

Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Created by MG96

External Public cs.CL

Statistics

Citations

References

Last updated

Authors

Shahar Katz Lior Wolf

Project Resources

Filter by Resource Type:

Name	Type	Source	Actions
ArXiv Paper	Paper	arXiv	View Edit Delete
Semantic Scholar	Paper	Semantic Scholar	View Edit Delete
GitHub Repository	Code Repository	GitHub	View Edit Delete

Abstract

The success of Transformer-based Language Models (LMs) stems from their attention mechanism. While this mechanism has been extensively studied in explainability research, particularly through the attention values obtained during the forward pass of LMs, the backward pass of attention has been largely overlooked. In this work, we study the mathematics of the backward pass of attention, revealing that it implicitly calculates an attention matrix we refer to as "Reversed Attention". We examine the properties of Reversed Attention and demonstrate its ability to elucidate the models' behavior and edit dynamics. In an experimental setup, we showcase the ability of Reversed Attention to directly alter the forward pass of attention, without modifying the model's weights, using a novel method called "attention patching". In addition to enhancing the comprehension of how LM configure attention layers during backpropagation, Reversed Attention maps contribute to a more interpretable backward pass.

Project:

Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Statistics

Citations

References

Last updated

Authors

Project Resources

Abstract

Note:

No note available for this project.

Contact:

No contact available for this project.

Project:

Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Statistics

Citations

References

Last updated

Authors

Authors (2)

Project Resources

Resources (3)

Abstract

Note:

No note available for this project.

Contact:

No contact available for this project.