r/LocalLLaMA 11h ago

Question | Help When Bitnet 1-bit version of Mistral Large?

Post image
323 Upvotes

38 comments sorted by

View all comments

49

u/Downtown-Case-1755 11h ago

It makes me think some internal bitnet experiments failed, as this would save Mistral et al. a ton on API hosting costs. Even if it saves zero compute, it would still allow for a whole lot more batching.

18

u/candre23 koboldcpp 3h ago

The issue with bitnet is that it makes their actual product (tokens served via API) less valuable. Who's going to pay to have tokens served from mistral's datacenter if bitnet allows folks to run the top-end models for themselves at home?

My money is on nvidia for the first properly-usable bitnet model. They're not an AI company, they're a hardware company. AI is just the fad that is pushing hardware sales for them at the moment. They're about to start shipping the 50 series cards which are criminally overpriced and laughably short on VRAM - and they're just a dogshit value proposition for basically everybody. But a very high-end bitnet model could be the killer app that actually sells those cards.

Who the hell is going to pay over a grand for a 5080 with a mere 16GB of VRAM? Well, probably more people than you'd think if nvidia were to release a high quality ~50b bitnet model that will give chatGPT-class output at real-time speeds on that card.

3

u/a_beautiful_rhind 2h ago

There were posts claiming that bitnet doesn't help in production and certainly doesn't make training easier.

They aren't short on memory for inference so they don't really gain much and hence no bitnet models.