5700 xt rigs crashing on 5.4.80

,

I have 2 rigs running 8x 5700 xt each (with one 5700). I upgraded to hiveos version 5.4.80 this morning, and both rigs immediately started crashing due to teamredminer detecting a gpu as dead. There were more than 100 reboots in less than 4 hours between both rigs. After extensively troubleshooting both rigs, including taking multiple cards apart and cleaning and adjusting the thermal pads (the cards with errors had the stock thermal pads replaced with thermal grizzly minus 8 pads), I ended up downgrading to 5.4.0. Both rigs are now running stable again for the past 45 minutes.

3 Likes

Yes I had a similar problem, and downgraded to the stable version

2 Likes

So this is why my 5700 rig keep crashing. One card specifically. Happened not soon after I updated it to 0.6-204@210608. Went mad switching out risers, putting the “dead” gpu into my desktop and it worked fine. Such an insanely stressful couple days.

Will downgrade to 0.6-203@210604 and see if that solves the issue with my 5700 rig.

Also, upon updating to 0.6-204@210608 on my other rig of R7’s, I had issues with my network not seeing the rig, couldn’t ping it and SSH didn’t work. Just downgraded to 0.6-203@210604 and can SSH into the rig no problem now.

My Vega 64 rig had 98 invalid shares in less than a 24 hour period on 5.4.80. Downgraded to 5.4.0 and am once again averaging only 1-2 invalid shares per day. There are definitely some problems with 5.4.80.

Edit: I actually downgraded from 5.4.80 / 0.6-204@210608 to 5.4.80 / 0.6-203@210608.

How did you downgrade to the 5.4.0 kernel? Did you reflash to an older version rather than using the downgrade feature in the dashboard? My rig is relatively stable currently with version 0.6-203608, kernel 5.4.80. I don’t know how to downgrade the kernel, other than re-flashing my SSD with an older image.

In any case, my AMD rigs are stable with 5.4.80/0.6-203608. While my NVIDIA rigs are rock solid stable on the newest version and kernel. Almost a month of uptime, compared to 15 hours stability at most with my AMD rigs.

Annoying to troubleshoot say the least.

I used the downgrade button in the hiveos iOS app. For my 5700xt rigs, I was able to downgrade to 5.4.0 / 0.6-203@21608. Apparently the option for the 5.4.0 kernel was no longer available when I downgraded my Vega 64 rig, so I’m running 5.4.80 / 0.6-203@210608. But it is running stable now, so I’ll keep it there for the time being.

FYI if you use the web interface instead of the app, there are many more downgrade options available (see the attached screenshot).

2 Likes

Dang, I just tried to add a 5600xt last night to my nividia rig and teamred gave me the same error, gpu dead. lol

I will try doing this, been looking for any ideas on why it was doing that. Thanks

I had the same problem but took some time and changed some OC settings, it seems to work now.
(if someone has suggestion, or better setting feel free to let me know, tnx)


image
image

@raid2606 what were your OC settings before you changed them?

I’ve got my cards flashed with modified 5700 (non xt) bios per this guide: Rx 5700 xt - 58.4MH/s 95w. Not sure if it’s the modified bios or my OC settings, but 0-6-204@210608 did not like my setup at all.

My OC settings getting 54.5-58.5 MH/s (running stable since downgrading hiveos):
Core: 1350
VDD: 750
VDDCI: 750
Mem: 890-900 (Samsung) / 940-950 (Micron)

This was before the hiveos update (the picture), but also before the rise of the temperature (summer)
I had a few hashes more. I might try those settings for modding bios but at the moment it is working with no rebuts for few days so I don’t wanna mess it up.

I do have the exact same problem. I have two rigs with 5700 xt and in both of them I cannot install nothing above 0.6-203@210715 or I will have gpu as dead.
Even if I downclock everything to stock I will get crashes. Do someone have a solution for this?

@calado90 my solution as described was to downgrade to 5.4.80 / 0.6-203@210608. I’ve been running stable on that firmware since I posted this.

Yeah I keep it stable on that version to. But it’s a shame because now we cannot upgrade it to have new features

Same issue here and racked my brain trying to figure it out. 2 of my 3 5700xts would fail on the newest version. Only fix I had was downgrade to 203@210721. Has the newest update to TRM which is good by unfortunate since we can’t get the newest hive features.

same issue here, i went to version 203@210604, works fine now.

Did anybody contact HiveOS about this issue? I still cannot upgrade my rigs with 5700XT’s past 203@210608 without crashes.

I’m Sapphire Pulse, anyone else?

did you read the changelog? do you all know that trm made some big changes allowing you to use less core clock and voltage? maybe your current old tuning just is too much after the update.

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.