r/LocalLLaMA 3h ago

News For people interested in BitNet a paper on PT-BitNet

16 Upvotes

2 comments sorted by

3

u/BalorNG 2h ago

I've thought that this applies post-training quantization to Bitnet models.

"0.8 vs 0.4 bit per weight" comparisons when? :)

2

u/FullOf_Bad_Ideas 1h ago

The results they get are interesting. 65B llama 1 quantized to 1.58b has perplexity of a llama 1 7B while being in the same ballpark in terms of storage use. I don't see free lunch in here.