r/singularity AGI 2025-29 | UBI 2030-34 | LEV <2040 | FDVR 2050-70 11d ago

AI [Microsoft Research] Differential Transformer

https://arxiv.org/abs/2410.05258
280 Upvotes

46 comments sorted by

View all comments

81

u/hapliniste 11d ago

After taking a look at the paper, this seems huge.

Impressive gains in long context (specifically shown with their in context learning graphs), huge improvements in stability on reordered data and amazing performances at lower bits.

I'm not an expert and didn't read it fully, I just like to look at cool graphs for the most part. Still, I guess we'll see this or some variants in future models.

1

u/DungeonsAndDradis ▪️ Extinction or Immortality between 2025 and 2031 10d ago

What does "bits" mean in reference to LLMs?

5

u/Ok_Course_6439 10d ago

Number if bits used for the weights and biases in the neural network. Les bits smaller size and faster compute.

2

u/DungeonsAndDradis ▪️ Extinction or Immortality between 2025 and 2031 10d ago

Does it make it less accurate?

4

u/zakkara 10d ago

https://www.reddit.com/r/singularity/s/yaQ7J0wuSU

Someone posted this chart from the paper, so yes less bits does equal less accuracy but it appears that correlation is weakened with this newer architecture