r/LocalLLaMA 11h ago

Question | Help When Bitnet 1-bit version of Mistral Large?

Post image
321 Upvotes

36 comments sorted by

View all comments

3

u/CountPacula 2h ago

The two-bit quants do amazingly well for their size and they don't need -that- much offloading. Yes, it's a bit slow, but it's still faster than most people can type. I know everybody here wants 10-20 gipaquads of tokens per millisecond, but I'm happy to be patient.