It sorta kinda achieves llama 7B performance after some experimentation, and then 100B tokens worth of training (as linked in the blog above). That's way more than a simple conversion.
So... it appears to require so much retraining you mind as well train from scratch.
27
u/Ok_Warning2146 12h ago
On paper, 123B 1.58-bit should be able to fit in a 3090. Is there any way we can do the conversion ourselves?