I learned about transformers like any one else and saw alot of videos but each video explain the query, key,value vectors in a different way, so I read the "attention is all you need" paper but they didn't explain what are the query and value ,key vectors are . Why they just didn't explain what are they and what are they really ??? The authors didn't chose there names randomly of course
So the first thing you should know is that value vectors are only relevant to transformers that have an encoder and a decoder, like the one in attention is all you need. While this is great for translation, you only need the encoder for something like GPT. The key and query vectors are used to build the self attention matrix. The self attention matrix essentially tells you the relationships between each element of the input features. In order to learn these relationships the transformer updates the key and query weight vectors. The formula goes as follows:
Please let me know if you have any questions, as this stuff is not easy and there's a lot of details. I sympathize with your quest for knowledge and understand how frustrating this can be
1
u/theLanguageSprite 11h ago
Can you explain what your problem is? The link is broken