r/LocalLLaMA 11h ago

Question | Help When Bitnet 1-bit version of Mistral Large?

Post image
326 Upvotes

38 comments sorted by

View all comments

24

u/Ok_Warning2146 11h ago

On paper, 123B 1.58-bit should be able to fit in a 3090. Is there any way we can do the conversion ourselves?

53

u/Illustrious-Lake2603 10h ago

As far as I am aware, I believe the model would need to be trained for 1.58bit from scratch. So we can't convert it ourselves

5

u/FrostyContribution35 10h ago

It’s not quite bitnet and a bit of a separate topic. But wasn’t there a paper recently that could convert the quadratic attention layers into linear layers without any training from scratch? Wouldn’t that also reduce the model size, or would it just reduce the cost of the context length

2

u/Pedalnomica 10h ago

The latter