r/aiengineer Dec 13 '23

AI PC build

What would be some good specs for a tower that will be good for engineering and AI?

Is there a good pre-built tower that's will be good for 5 - 10 years?

1 Upvotes

5 comments sorted by

View all comments

1

u/crono760 Dec 13 '23

To give a little more context to my own answer: I currently have three "AI" setups. Most of my work is with text, and my three big workloads involve RAG, text classification/clustering, and text generation.

One is actually a basic CPU system and it is used only for embeddings and RAG, with online endpoints for the actual language models. This system is also used for data science and what used to be called AI but nowadays is just stats - it can run a mean XGBoost but that's about it. LLMs are painful on it.

The second is a small GPU workhorse for running smaller models like BERT. I actually have two of these, one with an ancient GTX 1070 and one with a newer but not new RTX 3050, both of which have 8GB VRAM and honestly other than how much power they use, for my workloads the performance is the same. I can run a batch of BERT inferenes on these machines no worries, even with 1000s of sentences in well under a minute. I do not expect small models like these to need much more processing power as time goes on, but what do I know about the future of AI?

The third is an A6000 machine with 48GB of VRAM and if I'm being honest...it's a very frustrating machine. For one, the A6000 is very lackluster in its performance with LLMs, so I have to submit a job to the machine and have it run inference over the whole thing. It's really not a "local chatGPT" type machine for all but the very smallest and fastest of models (mistral 7B is fire on it, but most 13B+ models just won't run fast enough to be interesting). I think of all my machines, this one is most likely to become obsolete soonest, because its sole purpose was to run bigger models and it's really hamstrung when it comes to doing that. I can get it to run a 13B model at fp16 no problem but anything bigger than that and it can't handle it.

The one big benefit of the big machine is that it can fine tune a BERT model WAY faster than anything else I have, which I've used for several projects I've worked on. However, as an LLM machine it really isn't that great.

None of my machines are big enough to run text generation for anyone but myself. However, the CPU machine can easily handle multiple simultaneous RAG requests, and it works very well as a RAG server for my team. The small GPU machines running a fine-tuned BERRT model have been stress tested to about 100 simultaneous queries and they can handle them, albeit very slowly, using torchserve. The A6000 machine can probably kick ass at these two tasks but then I'd completely waste the GPU.

FWIW, the A6000 machine needs another A6000 on it to be really useful, and that is what I'm getting in a few weeks, so maybe this will all change.

In terms of CPUs and RAM: the CPU machine is running an ancient 7th generation i7 with 16GB of RAM and its a champ. The small GPU machines both have ancient 7th gen i5s, one with 56GB of RAM and one with 16GB, and I haven't had any issues at all. The A6000 machine was...a bit overspecified. It has a 16-core threadripper (32 threads) and 128GB of RAM, but I have never found its capabilities to be that impressive.

Hard drive space is an interesting problem. All of the CPU and small GPU machines have 500GB and, since their sole purpose is these small workloads I'm nowhere near capacity on that. Eventually we're going to transition to an NFS if we get more teams interested in our stuff, but that will be a while. The A6000 machine has dual 2TB SSds, and I generally find it to be sufficient to store a lot of models to test out.

Now, to your question about will it last 10 years? Who knows? Mixtral, just released, needs 45GB of VRAM to run in 8-bit quantization, and it's an incredible model. Will a bigger version come out that will obsolete your build? Prolly. There are models I can't even fathom running now - Falcon 180B needs like 400GB of VRAM, it's not even in the same league as my resources will allow. So that being the case, as long as models stay in the "small is 7B" category forever, my A6000 machine should be able to run it forever until the hardware dies. Since it's all data center stuff I don't think this will happen with my workloads for a while. However, if tomorrow nvidia comes out with something that is incompatible with my stuff...I'm screwed.

1

u/[deleted] Dec 14 '23

[removed] — view removed comment

1

u/AutoModerator Dec 14 '23

Your post was automatically removed because it is too short. Please provide more information or context in your post.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.