Guys can someone help me please? i am having problems with my rig and i cannot find what causes the issue.
I have an ASUS Z170 Maximus VIII Hero with an Intel Core i5-6600k with a single stick of 8 GB of RAM, the motherboard has 6 PCIe slots and a single M.2 slot, i am running HiveOS from a sata SSD, i am running 1 M.2 to PCIe adapter and a splitter 1 to 4 for a total of 10 GPUs 3x RTX 3080 1x RTX 3060 Ti 6x RTX 3070, everything powered by 1x Corsair AX1000 (1000W 80+ Titanium) and a Seasonic PX-1300 (1300W 80+ Platinum)
The rig had no problem for almost 1 month (the last time i added a card) and when i added the 10th and last card it didn’t worked flawlessly, i had strange problems, the OS saw all of the cards but one of them was displayed as GA104 (MALFUNCTION), after several reboots, it finally worked fine until now.
Sometimes the rig starts correctly, the OS loads and i see every card, it also mines fine for a random period of time (could be a couple of minutes or several hours) and then a random card crashes, i have the watchdog that restarts the rig if the hashrate drops, and when it reboots, sometimes the card that crashed is displayed again as GA104 (MALFUNCTION) sometimes the number of GPUs that aren’t working goes up to 2
i honestly don’t know what is causing this, there is something that causes stability problems.
Things i tried:
Changing Risers
Reinstalling Drivers
Reinstalling HiveOS
Changing RAM Slot
Things i changed in BIOS:
PCIe Gen1
4G Decoding Enabled
CSM Support Disabled
Intel LAN Disabled (i am running through Wi-Fi)
Onboard Audio Disabled
Primary graphics IGFX (I tried with Auto but it didn’t change anything)
Every SATA port expect the one with the SSD is disabled
Every option of CPU/RAM overclock is set to Auto
With 10 GPUs i get this strange graphic artifacts which make the rig to not post, it is stuck at the ASUS logo
if i remove a card from one of the PCIe slots of the motherboard the problem persists
if i remove a card from 1 to 4 adapter the problem goes away and it posts, it seems to work fine
if i remove everything and just use the 1 to 4 adapter with 4 cards it works fine
The thing is that it basically worked fine for almost a full month before giving me this stupid problem, i didn’t changed anything, i just woke up a day and the rig was off, after turning on those problems occurred.
Anyone have some tips i could try?