5700xt keeps crashing after a few hours

My rig is composed of 4 5700xt.
It keeps crashing after anywhere from 20 mins to 18 hours. I’ve never got it to run for over 18 hours.

Error logs

Miner logs

[0mAverage speed (10s): 0.00mh/s | 48.23mh/s | 0.00mh/s | 0.00mh/s Total: 48.23 mh/s 
[38;2;189;183;107mNew job received: 0xd01ff5 Epoch: 387 Target: 000000006df37f67 
[0m[38;2;178;034;034mStuck device detected, invoking emergency script 

The real problem : OS logs (repeated over and over)

Jan 10 12:25:23 hive5700XT kernel: [58483.988705] amdgpu: Failed to export SMU metrics table!
Jan 10 12:25:28 hive5700XT kernel: [58488.988954] amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!

I’ve tried numerous different OC settings, these ones are pretty conservative and have good temps

What I’ve tried so far :

  • Update B250 motherboard with this guide
  • Change risers, splitters and power cables
  • Tried running a single card at stock, crashed after ~20 hours (same OS error log)
  • Update Hive OS to 0.6-191@210109
  • Switched miners from PhoenixMiner to lolMiner (same OS error log)

Other info :

  • Kernel 5.0.21-201105-hiveos
  • AMD Driver OpenCL 20.30

I can’t find much information about these error logs, most of the ones that I found online are related to monitor issues, which doesn’t apply to me.

Anybody else encountered this issue?

Maybe a power supply issue. The watts reported by Hive / Software are less than actual, which is why people get a monitor that actually plus into the wall to measure power draw.

I just put a Sapphire in my system (See GPU 2) and it has been running fine, so I have not modded the vBIOS yet.

Try using my overclock setting and see how it works.

Oh, Also I am using the Asus B250 on latest BIOS (update via web), not using the PCI x16 slot, and all PCI slots are set as Gen2. Just FYI, I read the protocol you linked and I think that I followed the same instructions.

For your risers, how are you powering them? I know many come with SATA power cables but those are unreliable at best and dangerous at worst. I power all of mine using the 6-pin PCIe cable. AND, if you are running multiple power supplies the riser and card need to be on the same PSU.

I have 6x rx5700xt. I found that undervolting the memory controller caused stability issues. Try removing your under volts for those.

Then slowly work them all down over 24 hours again.

I ended up changing mobo to a brand new tb250-btc, I was powering the cables through pcie and then I changed to molex, single molex cable per riser. Unfortunately still encountering same problem.

Could it be the PSU? How many watts is yours rated vs how much is being pulled at the wall (many people use a “kill-a-watt” for this)?

Have you tried running 3 cards with overclocks (and maybe more importantly undervolts) set to try to reduce power draw?

I would look at the power supply pretty hard here, it seems like you have changed everything else.

How did you mod your 5700Xt with Samsung memory? Mine keeps generating rejected shares

Mine restarts around 10 to 18hrs, I just don’t care. Just let it restart.

I followed the directions for modding the vBIOS here: https://www.igorslab.de/community/threads/gigabyte-5700-xt-bios-mod-fails.3304/post-79315

Be sure you are on the silent switch, as the OC position only gives you a single flash before it locks down.

Can you please tell me how to get those values your gettin for your Gigabyte?
With your settings i’m having 53MH.
Does the vBIOS make such a difference in perfomance and temperature and consumption?
Where can I get it for Gigabyte Radeon RX 5700 XT GAMING OC 8G Rev 1.0?

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.