GLM5.2 on 5x Pro 6000s and a 5090, an expensive journey

This started as something I thought was reasonable. I already had a 5090 for my gaming machine, and I thought a second 5090 would make me happy. Instead, it sent me down a rabbit hole that got completely out of control.

I wanted something that would have full PCIe 5.0 x16 speed across all slots, which started a chain of events that had me spending good money after bad. It was a bit of a nightmare, as every decision I made led to me needing to make even tougher decisions. Couple that with what was actually available, and my hand was forced in a few spots.

I started with the motherboard and worked my way backwards, eventually ending up with this setup. I wanted something close to endgame, but I still made a few concessions:

Threadripper Pro 9975WX WRX90 Sage SE 4×48 GB DDR5-6400 RDIMM Antec 900 case — ended up in the bin

The system started with two 5090s. The Antec 900 is well built, with huge space, smart connections, and refined edges, but ultimately it did nothing at all to support the GPUs. In a case this large and at this price point, that is a huge failure on their part, and for that reason I recommend avoiding it. If they had put $1 worth of bracketry in the machine to support GPUs, I’d give it a 10/10. With the lack of support, it is nearly useless unless you deal with it yourself, which I did, as you can see in the images. It’s like buying a Ferrari and having it delivered without any petrol.

With the two 5090s, I was working with smaller Qwen models, which seemed great, but it was clear that with the limited VRAM and my desire for additional sidecars like VL, I needed something more. I had huge plans, and the models were just too small to deal with the complexity.

So I got my first Pro 6000. I coupled it with a 5090, which made for weird tensor splits, but llama.cpp did a good job of divvying it all out. But now I was working with 120B-parameter models with almost no space for context. So it was smarter, but also a goldfish.

Then I went to 2× Pro 6000 + 5090. Now I had the space for context. But in reality, the jump from 27B to 120B did not knock my socks off. I could get a bit farther now. I was at about 90% with the 27–35B models, and with the 120B models I was at about 95%. But 95% is about as useful as 90% if I can’t close the loop. If I can’t actually finish the task, it’s all for nothing.

In came 3× Pro 6000. Now I was in the MiniMax range, and finally I was getting somewhere. It was like I got concierge service at a ball game. My needs were being met, and I got answers for everything. Many of them were completely wrong answers, though. I had tons of code that was poorly made and led to dead ends and rewrites.

4× Pro 6000 created an issue that I knew would come. I had been seeing several folks claim that they were able to deal with the thermal issues that came with side-by-side Pro 6000 cards. I knew they were likely not telling the truth, but I also knew a rebuild was probably in order anyway.

So, as you can see in the image, I placed four side by side and had thermal issues, even with the additional fans in the image and a 27-inch box fan sitting on top, which is not shown. I clocked things down a bit and still had a few system freezes. I gave up immediately and went to the high-rise.

I got a couple of open-case designs and connected them together, thinking every two or three GPUs would get their own floor. It was overly complicated dealing with risers and cooling, so I dumped it pretty quickly.

But now, with GLM and Kimi, I was actually accomplishing things. The quants were tight, though, and my context was low again.

5× Pro 6000 + 5090, along with the release of GLM 5.2, was an absolute game changer. I’m talking 98–99% now. I have plenty of room for context and sidecars, all running on the 5090 at blazing speeds. But blazing is legit: it is producing so much heat now that it’s a problem, and it’s summertime to boot. I had to get a second PSU, which I suppose, in all of this, is not the most ridiculous bit.

At full tilt, with 100% GPU usage for 30 minutes in this custom extruded aluminium design, with an outrageous number of fans in a ~20°C basement, the GPUs top out at about 70–75°C, which I’m very happy with.

I finally do not desire another GPU, as all my needs seem to be met. Was it worth it? LOL, no. Absolutely not. This was a terrible idea. DO NOT DO THIS. I figure that at the rate I’m generating tokens, it will take over 10 years to break even at today’s prices, and that’s not accounting for electricity bills.

I’ve never used the frontier models before, but I’ve seen the reviews and the speeds, and I’ll never match those with open weights. But it was a fun journey.

I deleted the electricity company’s app from my phone so they’d forget about me for now.

原始关键词#expensive#journey#6000s#5090#glm5#pro

查看原文reddit.com

单一来源，暂无交叉验证