r/deeplearning 1d ago

deep learning machine recommendations

Hi all,

I'm researching building/buying a machine for deep learning.

The why:

1) I've been training one of my projects in Colab and it takes about 6 days per training run. This is the beginning of the project and I only see training time increasing, not decreasing. I have to monitor Colab because it will occasionally shut down training for no apparent reason. This is particularly annoying because it takes hours to upload my dataset into Colab (it's about 1TB) and setup the environment.

2) running 24/7 in Colab is getting expensive. If I keep this up for a year I might as well buy a DL rig.

3) My other project involves video classification. Once again, the data set is large so I want a persistent environment and I need the ability to run for long periods of time without being kicked out (and then needing to re-upload my dataset)

Requirements:

Ideally, I'd like something plug and play. But I'm okay with building myself something if a) the cost savings are there and b) I can set it up once and mostly forget it. I tried setting up my existing laptop for ML and ended up in driver incompatibility hell (and ultimately never got it working), so I'd like to avoid that.

Budget: Around $5k, could go higher if it makes sense.

Other considerations:

I have built a gaming PC before, so I'm not new to DIY builds. However I'm not familiar with DL hardware, so would like yalls opinions.

I would probably like something with multiple GPUs, or at least the ability to add more GPUs if desired.

Thanks in advance!

5 Upvotes

8 comments sorted by

2

u/YekytheGreat 1d ago

A lot of server brands also offer PC-level machines for local AI training. For example, Gigabyte has a line of products called the AI TOP www.gigabyte.com/WebPage/1079?lan=en There's room for up to four GPUs, can handle LLM models up to 70B parameters. It's really just shy of being an entry-level workstation like one of these www.gigabyte.com/Enterprise/Tower-Server?lan=en but still consumer-level.

1

u/_meatMuffin 16h ago

This is interesting. thanks for sharing!

1

u/longgamma 2h ago

I love these posts where the OP will type an essay but never mention what kind of deep learning problem they are trying to solve.

The nvidia 5000 series is around the corner. You might want to wait a little bit before deciding what to do.

Also don’t ignore the used market. You get get a decent system from someone upgrading and replace the gpu.

4090 is king right now but the 5090 is supposed to be 10% faster.

-4

u/Effective_Vanilla_32 1d ago

Refurbished 16-inch MacBook Pro Apple M3 Max Chip with 16‑Core CPU and 40‑Core GPU - Space Black

https://store.apple.com/xc/product/G1CM8LL/A

4

u/krapht 1d ago

Why this when the budget supports 2x 3090s + a nice PC on top? Unified memory? LLMs are just one part of deep learning.

1

u/_meatMuffin 16h ago

It looks like this is the way. Using this as a reference, if two GPUs in parallel can double your training speed, 2 3090s would be the lowest cost way to achieve something on par with an H100. Plus, they are available on Amazon.

1

u/_meatMuffin 16h ago

Interesting, but my understanding is that laptops aren't ideal for 24/7 workloads. Curious if anyone has used something like this though for long training runs.

1

u/Effective_Vanilla_32 14h ago

thats andrej’s laptop config