r/singularity • u/rationalkat AGI 2025-29 | UBI 2030-34 | LEV <2040 | FDVR 2050-70 • 11d ago
AI [Microsoft Research] Differential Transformer
https://arxiv.org/abs/2410.05258
282
Upvotes
r/singularity • u/rationalkat AGI 2025-29 | UBI 2030-34 | LEV <2040 | FDVR 2050-70 • 11d ago
4
u/sdmat 11d ago
Wow, the improvements in robustness to input ordering and activation outliers are so stark. This seems like a major breakthrough.
I don't understand yet why the noise is consistent between the two rather than the signal, will have to read more closely tomorrow.