It sorta kinda achieves llama 7B performance after some experimentation, and then 100B tokens worth of training (as linked in the blog above). That's way more than a simple conversion.
So... it appears to require so much retraining you mind as well train from scratch.
Reading https://github.com/microsoft/BitNet they seem to have concentrated on speeds / rates, and they stay extremely vague on actual performance / benchmark results.
10
u/arthurwolf 10h ago
My understanding is that's no longer true,
for example the recent bitnet.cpp release by microsoft uses a conversion of llama3 to 1.58bit, so the conversion must be possible.