How to undervolt really low?

Yes, all my AMD cards are PowerColors.

HaloGenius, I think you can tell to devteam to put a comment on 5.54 upgrade, about this specific problem? At leat write it in the changelog maybe

@Bagster
Iā€™m already reported about this nuance at RU-community with announce new version - telegram channel, forums, etc and also at bitcointalk.org forum
Russian community stick my announce (in Russian ofcourse) in RU channel. Yes, Iā€™m agreed it would be much better if this information contains in Announce channel.
Bitcointalk anounce was such:

v0.5-54

  • IMPORTANT! New AMD undervolting technique, more reliable core voltage states now. Though itā€™s slower to apply.
  • Invalid shares reported by Claymore for each GPU

Additional notice:
In version 0.5-54, a new mechanism is used to reduce power consumption. Along with this GPU core voltage,which previously set, may need to be adjusted.

If this Ok we can add this info to Announce channel also. Or you can propose more informative variant

How about reverting it or offering the old version in addition to the new version given it has issues for at least those with PowerColors?

Maybe just a note like this:

WARNING: New power consumption mechanism seems to introduce fan settings problems on some Powercolor (to be determined) cards.

Is over 40 cards personally enough to validate that? I have 570ā€™s and 580ā€™s, Red Devil, Red Dragonā€™s, Golden and non. I have a whole DC full of these cards, so several hundred, and I canā€™t in good faith upgrade to the latest HiveOS until this issue is resolved. Iā€™m trying various bios changes, over/under clock changes, etc. And keep getting GPU hangs or 0% fan speeds. Revert back to the previous amd-oc, no issues.

@seanjnkns, can you try moving fan (lines 200-201 on the v0.5-54 amd-oc) above core/voltage (lines 174-193)? If it doesnā€™t work - can you delete or put # before the fan lines?
Something is causing wolfamdctrl to break on your cards, but I donā€™t know what.

Testing moving the lines now.

For information (if others have similar problem) rebooting was not sufficient for me to correct the problem. Have to shutdown and remove power

@brnfex, I tried your suggest modification and that worked across 4x10 RD570/580 mixes of Elpida, Hynix and Micron ram.
@Bagster, simply powering off temporarily would fix the issue, but a subsequent reboot could and typically did, undo it.

Glad to hear that, @seanjnkns :slight_smile: So you moved the fan lines before core/voltage ones and it worked? Are you using fixed fan speed, or left it on auto?

Fixed fan speeds of 80%. However, if you re-apply the OC settings after boot, the system is prone to crashing with a null pointer dereference kernel stack trace, or, the miner will get stuck in unterruptible D or Z state.

Nope, I spoke too soon. After further testing, the original problem of a 0% fan speed is back.

For now, reverting back to the old amd-oc. Had enough downtime guinea pigging my servers.

I think we should scratch 0.5-54, i donā€™t think it is applicable to the ā€œgeneralā€ users

I donā€™t even think itā€™s a ā€œgeneralā€ user issue. The ā€œupdatedā€ amd-oc should be left as an option for users to use at their own risk as itā€™s experimental, while keeping the stable one present. Just like was done with dual claymore for example. You can have latest, or you can have a prior version. Those that want to be guinea pigs can, those that opt for stability, can use a prior version.

Can you try to increase the overclock to 950mV/1150MHz and check if the cards are stable?

Iā€™ll let someone else guinea pig at this rate. Changing the settings doesnā€™t change the fact that you still get null pointer derefs with this amd-oc, even with the lines moved around in the code, youā€™ll get 0% fan speeds, etc. If someone else wants to spend a few hours playing with this, more power to them :slight_smile: I just spent 3 hrs on this trying all sorts of settings and the only thing that was stable, and Iā€™m not just talking about lack of GPU soft lockups, was the older amd-oc.

It could be caused by a crashed card after too low voltage or too high overclock. The cards ran at roughly 950mV before the patch and didnā€™t go lower, regardless of the value in the web dashboard. The only thing that is changed is that all voltage and core states are overridden - it shouldnā€™t affect the fan at all. So Iā€™m trying to understand why your cards are crashing while everyone elseā€™s arenā€™t.

Yeap, this should really be left as an option for the mean time, then for v0.6, OC changes can be managed in the web UI as well,ā€¦

Iā€™ll try to mess around with amd-oc this weekend,ā€¦ my 6-gpu test bench should be enough, heh, hopefully it wonā€™t take soo much time to get it right, time is $