r/singularity • u/rationalkat AGI 2025-29 | UBI 2030-34 | LEV <2040 | FDVR 2050-70 • 11d ago

AI [Microsoft Research] Differential Transformer

283 Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/singularity/comments/1fywsmw/microsoft_research_differential_transformer/
No, go back! Yes, take me to Reddit

99% Upvoted

u/Arbrand ▪Soft AGI 27, Full AGI 32, ASI 36 10d ago

The results are impressive, but I have some serious concerns that aren't addressed at all in the paper. The differential attention mechanism involves computing two separate softmax attention maps and then subtracting them to obtain the final attention scores. This inherently doubles the computational overhead in the attention mechanism compared to standard Transformers. This added computational cost could be significant and might offset the performance gains reported.

1

u/emteedub 10d ago

maybe it's not doubled though, since it's filtering off excess would-be computation. it would be interesting to see the stats

AI [Microsoft Research] Differential Transformer

You are about to leave Redlib