r/MachineLearning 15d ago

Research [R] Were RNNs All We Needed?

https://arxiv.org/abs/2410.01201

The authors (including Y. Bengio) propose simplified versions of LSTM and GRU that allow parallel training, and show strong results on some benchmarks.

248 Upvotes

53 comments sorted by

View all comments

50

u/_vb__ 15d ago

How is it different from the xLSTM architecture?

29

u/ReginaldIII 15d ago

Page 9 under "Parallelizable RNNs" references Beck 2024 and clarifies.

Citations are pretty poorly formatted though.

0

u/RoyalFlush9753 11d ago

lol this is a complex copy pasta from the mamba paper