Wdog Thread(s) not responding - I'm Losing It

I created a reddit post but no luck so far.

I have an 8 x 5700 XT rig with 1600W PSU, consuming 1200W at the wall. Running on HiveOS latest beta. Both with teamredminer and phoenixminer, it randomly reboots and sometimes freezes. I have tried everything I can think of. Randomly it says “wdog GPU not responding” for several GPUs and after “wdog Thread(s) not responding”, it tries to reboot. Sometimes successfully, sometimes it freezes the whole rig. I get similar error with teamredminer (it says dead gpu)

I thought it is the risers at first but all the “not responding” occurs at the very same time and failed GPUs are random too. You can check the log here

After that I thought, maybe it is the OC profile. I increased VDD from 740 to 850 gradually, no luck. Mem is at reasonable 895 and core is at 1375. I always used latest versions of the miners (for a short while, used phoenixminer v5.0e, no luck again). I’ve never added any option or whatnot to miner options.

I have been using HiveOS on a USB stick. Tried several sticks and now it is working on an 120GB SSD, no luck. Again.

Driver version is 19.30, which should be stable. The board I have is Asrock H110 Pro BTC+. I did BIOS arrangements of the mobo, too. It just resets too much and sometimes it gets stuck. Since there is no pattern in this behaviour (it worked 3 days straight last week without any hassle), I’m simply helpless.

Hardware info: G3930, 2x4GB RAM, 6 * MSI 5700 XT MECH OC + 2 * MSI 5700 XT MECH. 8 GPU+riser are powered seperately. Mobo is powered from seperate outs, too(this board needs a sata and 2 molex to power pci-e slots). Temps are 57 max for core and 88 max for memory in Celcius.

-Tried unplugging 2 of the cards to see whether the PSU is good enough, the problem remained. I cannot check whether cards are stable without OC because they draw too much power.

-I saw a thread someone got stable rig with kernel 5.0.21 and latest beta with this kernel is hiveos-beta-0.6-140@200520 , I will try this one and will update this notice accordingly. This didn’t work, too many errors, switched back to latest beta kernel.

Hey greetings.

I have 3 rigs. Each rig runs 8 x 5700 XT with 1600W PSU, like you. Each rig draws about 1150W from the wall. I am also running latest HiveOS beta on SSD with Phoenixminer. On Rig #3, the one in the picture, I am running the ASRock H110 Pro BTC+ MB as well.

  1. First thing I did was update the BIOS, which is at 1.60. (https://www.asrock.com/MB/Intel/H110%20Pro%20BTC+/#BIOS)
  2. I am using a Core i3 7th gen processor with this motherboard. (Intel Core i3-7100 7th Gen Core Desktop Processor 3M Cache,3.90 GHz).
  3. I am also using 16GB of Crucial RAM for this motherboard (just in case I ever wanted to switch to Windows one day) [Corsair Vengeance LPX 16GB (2x8GB) DDR4 DRAM 2400MHz C16 Desktop Memory Kit - Black (CMK16GX4M2A2400C16), Vengeance LPX Black, 16GB (2 x 8GB)]
  4. For SSD, I am using Kingston 120GB SSD. (Kingston 120GB A400 SATA 3 2.5" Internal SSD SA400S37/120G]
  5. For WiFi, I am using TPLink [TP-Link USB WiFi Adapter for PC(TL-WN725N), N150]

All my power Risers use a 6Pin PICe connection. So from my PSU, which has 8 GPU wires, I use 2 splitters. The first splitter powers both GPU connections, and the second splitter connects to the first and goes between the riser and the power from the PSU.

What I did was wrap each of the board connectors in electrical tape to prevent the metal of one touching the metal of the other, since they are sooooo close together.

I am also using a Watchdog that I just installed – I don’t know if this automatically works and integrats with HiveOS yet. I am run restart from shell and I hear the relay “click” to restart system.


Also, I am using a modified Bios on each card. I used both MPT and RBE to create new values. I just use the same modified Bios across all 5700 XT cards using HiveOS to flash them … only AFTER I have saved original bios from each and every single card first. Oh yes, I am using oboard video from motherboard for my monitor … I do not use any video output from any video card.

1 Like

Hey, thanks for the info. I have several questions regarding the screenshots. I haven’t updated bios, I definitely will.

But how come do you have that low power usage? I am using XFX cards (5700 XT Double Dissipation) and they all draw around 130W on software, they have Micron GDDR6. What am I doing wrong? I did BIOS mod to lower VDD limit to 725 and replicated 1550mhz memory timing. What else I should have done? Other than that, our setup is almost identical. I have used electrical tape too. I changed my power style: PSU came with already split cables. I used an external splitter, so I am powering GPU+riser from one gpu out. My problem is solved, maybe my first attempt was causing instability but I really wonder how did you manage these numbers. I am hovering at 1100 watts at software.

What you need to do is follow the values described in this thread as a starting point:

I even created my own thread over there:

Probably the best piece of advice came from using the Apple Straps. Most people say something like, “just copy the timings from 1500 or 1550 down to the rest of the timings” … well, using Apple Strap values are better.

Apple Strap Values:
Option 1 (Preferable and Recommended), applying Apple Inc. vram timings straps linked below, once for MT61K256M32 Micron, and Save the vbios after that load the saved vbios and once again apply the straps for K4Z80325BC Samsung if it is existed and save the vbios again, https://www.igorslab.de/community/attachments/k4z80325bc-mt61k256m32_gddr6_optimized_timings-zip.6544

3 Likes

Hocam merhabalar. I have this problem too. My rig has worked for 10 days nonstop, but yesterday, phoenix miner reboot by itself. Problem says Gpu * is not responding. So I tried to change miner from phoenix to different miner like teamred or gminer and its Fine. But tried to Turn back to phoenix, its still not mining. Could you please help me to deal with this problem?

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.