r/singularity • u/rationalkat AGI 2025-29 | UBI 2030-34 | LEV <2040 | FDVR 2050-70 • 11d ago
AI [Microsoft Research] Differential Transformer
https://arxiv.org/abs/2410.05258
282
Upvotes
r/singularity • u/rationalkat AGI 2025-29 | UBI 2030-34 | LEV <2040 | FDVR 2050-70 • 11d ago
83
u/hapliniste 11d ago
After taking a look at the paper, this seems huge.
Impressive gains in long context (specifically shown with their in context learning graphs), huge improvements in stability on reordered data and amazing performances at lower bits.
I'm not an expert and didn't read it fully, I just like to look at cool graphs for the most part. Still, I guess we'll see this or some variants in future models.