I have a strange problem and have exhausted all my options (that I can think about), so I am reahcing out to the community for help.
My rig powers up and boots into HiveOS and all 7 of my GPUs are typically detected and recognized. After some time (it may be minuites, an hour or a couple of hours) I will notice that my hash rate has dropped and when I look at my miner I see that it lost one of the cards. Sometime it is more than one card and they are not always the same.
Here is my setup and what I have tried so far:
- MSI Z390-A-PRO motherboard (MB has a PCIe power that I currently have powerd even though the risers have their own power, but I also tried without the PCIE power cable plugged to the board)
- i5-9600K CPU
- 128 GB SSD for HiveOS (plugged to SATA 1 port)
- 2 x 8GB RAM
- 1 x M.2 to PCIE adapter
- 2 x 1000 watts power supplies
- 7 x PCIe risers (tested and confirmed that they work). All risers are powered from the VGA 6 pin cables coming out of the PSU (no molex or sata cables here)
- All risers are powered from the same power supply that is powering the associated GPUs
- 3 x RTX 3060s
- 4 x RTX 3070s
- HiveOS 0.6-203@210410 with NVIDIA driver 460.67 >> miner is set to start with 15 seconds delay, and the OC settings are set to kick in after 120 seconds.
My BIOS is setup to Gen 1/2 and 96 for the PCIE, disabled everything that I dont need (audio controller, on-board video, virtualization, etc.)., enabled 4G, Windows 10 WHQL Support to UEFI, changed power settings to turn on, disabled serial ports…
I have tried the most up to date BIOS for the MSI Z390-a-pro, and also tried a few of the older versions. I think the latest BIOS firmware was the least stable and inconsistent. Currently running a version from 2019.
Here is a sample of what it look slike when one of the card is dropped. This time it took only about 2o minutes after the restart. Before that it took 2 hours…
I have also played with the overclocking settings and it does not seem to matter. In fact - I think that the longest ir ran for (36+ hours) was with higher OC settings.
Two observations:
- I noticed that the motherboard clock is off. I changed the CMOS battery and reset the board, but when I go and adjust to correct date and time, it does not retain it. It does not even register it as a change in the BIOS so it is not prompting me to save. I have tried adjusting the date and time along with some other changes that I need to make anyways, so I can save and exit, but no luck the next time I go in the BIOS the date and time has changed. It is tyupically a day ahead and several hours off.
- When HiveOS boots up I see in the console a warning message that the time is messed up. Not sure if this could be casuing all my troubles.
I am at a loss - any help and suggestions are appreciated!