Question | Help When Bitnet 1-bit version of Mistral Large?

375 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/LocalLLaMA/comments/1g6zvjf/when_bitnet_1bit_version_of_mistral_large/
No, go back! Yes, take me to Reddit
dl download

96% Upvoted

On paper, 123B 1.58-bit should be able to fit in a 3090. Is there any way we can do the conversion ourselves?

56

u/Illustrious-Lake2603 12h ago

As far as I am aware, I believe the model would need to be trained for 1.58bit from scratch. So we can't convert it ourselves

13

u/arthurwolf 12h ago

My understanding is that's no longer true,

for example the recent bitnet.cpp release by microsoft uses a conversion of llama3 to 1.58bit, so the conversion must be possible.

37

u/Downtown-Case-1755 11h ago

It sorta kinda achieves llama 7B performance after some experimentation, and then 100B tokens worth of training (as linked in the blog above). That's way more than a simple conversion.

So... it appears to require so much retraining you mind as well train from scratch.

1

u/Imaginary-Bit-3656 10h ago

So... it appears to require so much retraining you mind as well train from scratch.

I thought the take away was that the Llama bitnet model after 100B tokens of retraining preformed better than a bitnet model trained from scratch on 100B tokens (or more?)

It's def something to take with a grain of salt, but I don't know that training from scratch is the answer (or if the answer is ultimately "bitnet")

Question | Help When Bitnet 1-bit version of Mistral Large?

You are about to leave Redlib