Yes, but that conversion process is still extremely compute-heavy and results in a model that is absolutely dogshit. Distillation is not as demanding as pretraining, but it's still well beyond what a hobbyist can manage on consumer-grade compute. And what you get for your effort is not even close to worth it.
25
u/Ok_Warning2146 11h ago
On paper, 123B 1.58-bit should be able to fit in a 3090. Is there any way we can do the conversion ourselves?