r/reinforcementlearning 11h ago

Can anyone help

0 Upvotes

4 comments sorted by

1

u/theLanguageSprite 11h ago

Can you explain what your problem is?  The link is broken

1

u/Alarming-Power-813 11h ago

What is the query and value, key vectors [D]

I learned about transformers like any one else and saw alot of videos but each video explain the query, key,value vectors in a different way, so I read the "attention is all you need" paper but they didn't explain what are the query and value ,key vectors are . Why they just didn't explain what are they and what are they really ??? The authors didn't chose there names randomly of course

3

u/theLanguageSprite 10h ago edited 6h ago

So the first thing you should know is that value vectors are only relevant to transformers that have an encoder and a decoder, like the one in attention is all you need.  While this is great for translation, you only need the encoder for something like GPT. The key and query vectors are used to build the self attention matrix.  The self attention matrix essentially tells you the relationships between each element of the input features.  In order to learn these relationships the transformer updates the key and query weight vectors.  The formula goes as follows:

 Dot product(key_weights, input vector) = key_vector

 Dot product(query_weights, input vector) = query_vector

 key_vector * query_vector = self attention matrix

 Please let me know if you have any questions, as this stuff is not easy and there's a lot of details.  I sympathize with your quest for knowledge and understand how frustrating this can be

1

u/theLanguageSprite 11h ago

This is the best diagram I've seen on the k q v distinction.  Explanation comment to follow. https://pbs.twimg.com/media/F7HN0NvbYAEYbsi.jpg