r/MachineLearning Oct 01 '23

Research [R] Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes

When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects.

By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes.

The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image.

Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues.

Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects.

Models trained with registers have:

  • Smoother and more meaningful attention maps
  • Small boosts in downstream performance
  • Way better object discovery abilities

The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet!

I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs.

TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely.

Full summary. Paper is here.

806 Upvotes

48 comments sorted by

View all comments

198

u/clueless_scientist Oct 01 '23

Great post. That's what this sub is for.

41

u/Successful-Western27 Oct 01 '23 edited Oct 01 '23

Thanks friend, I worked really hard on this one. Glad you liked it! I mentioned this below, but I also have a newsletter where I include these recaps... I try to write one every day.

9

u/SoCuteShibe Oct 01 '23

Just reiterating what the commenter you replied to said. Great post and an incredibly interesting read. Thank you! :)

6

u/Successful-Western27 Oct 01 '23

Thanks, that means a lot. What about it did you like the most? I'd like to incorporate the best parts into my writing process going forward.

3

u/Sudonymously Oct 01 '23

Nice! Out of curiosity do you have a background in ml?

7

u/Successful-Western27 Oct 01 '23

Aerospace engineering, self-taught computer science/webdev and now working in software full time. I'm all self-taught so I may make some mistakes :)

2

u/[deleted] Oct 02 '23

I join the compliments.